The recent revolution in the collection and use of big data is transforming the way people learn, communicate, shop, find mates, consume news, and perform countless other tasks. Both governments and businesses are amassing a wealth of data on citizens, a trend that is expected to continue as technology advances. However, without a reliable mechanism to ensure that the data is accurate and up-to-date, risks abound. This is particularly concerning in the criminal justice system, where poor quality or outdated data have the potential to affect individual freedoms and employability.
Historically, the only truly accessible criminal justice system data were arrest records or trial verdicts. However, advances in information technology mean that law enforcement agencies are building datasets that can go much further. Notes from officers' or detectives' interactions with citizens — for example, when they interview them related to incidents where they might be a suspect, a victim, or a witness — are now often captured in computer systems rather than on paper, and therefore are kept longer and shared more widely than before. Technologies such as cell site simulators and license plate readers make it possible for law enforcement to collect data on where large numbers of mobile phones or cars have been. Social media and online communication data provide a picture of individuals' social and familial interactions. When accurate inferences can be drawn from such data to deter, prevent, and prosecute crime, society benefits.
However, when such data is inaccurate, outdated, or shared with other agencies or private third parties, it can lead to undesirable outcomes. Such errors have already had an impact. In a recent piece in the Washington Post, a reporter recounted how his search for an apartment was almost derailed when a series of criminal convictions erroneously showed up on a report from a private company to his potential landlord. In that case, the reporter knew how to figure out what had happened and resolved the situation, but other citizens — most lacking the skills of an investigative reporter — probably wouldn't. And if decisions based on inaccurate data are made “behind the scenes” — by an algorithm that assesses how much risk a person poses, or their likelihood of committing future crimes — the citizen may never know what happened.
Accuracy can mean more than data just being objectively correct, it may require enough context about how data was collected so that people — or algorithms — understand what it means. For example, if a law-abiding person is stopped by the police while heading to the store with a relative or friend who happens to be a gang member, and that encounter is recorded in a police database, the person might be wrongly flagged as connected to the gang. In a database that connects citizens to criminal gangs, how much context should have to be retained so a future investigator would know that it was just based on one trip to the store? How long should that flag be retained? How will it be translated when transmitted to other jurisdictions? If this person continues to be law abiding, but is stopped for a traffic infraction 10 years later in another jurisdiction, what will that officer see? Will the traffic stop be handled differently as a result? Should it be? To maintain fair treatment, when should the system be programmed to “forget” that this individual went to the store with a gang member? Five years? Ten? Twenty?
During a recent workshop we held with court and criminal justice system experts, the increasing volume of data was identified as a high priority problem. Data is increasingly being used to make the justice system work better, but as the amount and the sources of that data proliferate, mechanisms should be developed to ensure errors are not being tolerated. Unlike in other spheres where data drive consequential decisions about citizens — notably in the credit reporting and scoring system — the criminal justice system has no legally required processes that enable citizens to review data about themselves and challenge inaccuracies. And the broad sharing of justice system data both within and outside the legal system, makes it difficult to correct errors even when they are discovered.
There are no simple solutions for these problems. Our workshop participants wrestled with these issues and identified areas for additional research. One idea was the use of a data “expiration date” — a date at which data must be deleted based on its quality with more unsure or error prone data forgotten faster than verified information. The U.S. criminal justice system is currently integrating and adapting to the myriad data and analytic technologies that are transforming the rest of society. However, in this race to improve the efficiency and effectiveness of the system, the rights of citizens to be treated fairly, based on accurate and timely data and information, should be made a high priority. A failure to do so would be corrosive to the faith and trust Americans place in the system.
Dulani Woods is a quantitative analyst and Brian A. Jackson is a senior physical scientist at the nonprofit, nonpartisan RAND Corporation and a professor at the Pardee RAND Graduate School.
This commentary originally appeared on Inside Sources on February 28, 2017. Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.