In 2006, the New York City Police Department (NYPD) stopped a half-million pedestrians for suspected criminal involvement. Raw statistics for these encounters suggest large racial disparities — 89 percent of the stops involved nonwhites. Do these statistics point to racial bias in police officers’ decisions to stop particular pedestrians? Do they indicate that officers are particularly intrusive when stopping nonwhites? The NYPD asked the RAND Center on Quality Policing (CQP) to help it understand this issue and identify recommendations for addressing potential problems. CQP researchers analyzed data on all street encounters between NYPD officers and pedestrians in 2006. They compared the racial distribution of stops to external benchmarks, attempts to construct what the racial distribution of the stopped pedestrians would have been if officers’ stop decisions had been racially unbiased. Then they compared each officer’s stopping patterns with an internal benchmark constructed from stops in similar circumstances made by other officers. Finally, they examined stop outcomes, assessing whether stopped white and nonwhite suspects have different rates of frisk, search, use of force, and arrest. They found small racial differences in these rates and make communication, recordkeeping, and training recommendations to the NYPD for improving police-pedestrian interactions.
Does the New York City Police Department (NYPD) stop pedestrians of some races disproportionately? Questions like this one have been asked about police departments in many cities. Inevitably, the question raises a second one: disproportionately compared to what? Over the years, researchers have proposed a range of standards against which to compare the race distribution of police stops. For instance, investigators have argued that with race-neutral policing the race distribution of police stops should match that of the residential population in the city, or the race of arrestees, or that of crime suspects. We refer to the comparison of the race distribution of police stops to other standards as an approach to police performance evaluation using "external benchmarks." In Analysis of Racial Disparities in the New York Police Department's Stop, Question and Frisk Practices (RAND Corporation: Santa Monica, California, 2007) by Dr. Greg Ridgeway (the "RAND report"), we highlight problems with the use of external benchmarks, and propose a powerful alternative approach for analyzing bias in police performance that uses what we call an "internal benchmark."
Where did RAND get the data to evaluate race bias in pedestrian stops conducted by the NYPD? In 2007, NYPD provided RAND with data on citywide crime for the years 2005 and 2006. These data were used to construct the external benchmarks against which we compared the race distribution of pedestrians stopped by NYPD officers in 2006.
Is there a problem with the data NYPD provided to RAND? On May 3, 2013, NYPD informed RAND that the data it had provided on 2005 and 2006 violent crime suspects contained errors. NYPD then provided RAND with corrected data. Although most data on the race of violent crime suspects showed only minor differences with the data originally provided to RAND — differences which NYPD attributed to routine updates and audits — data in the "Other" race category were significantly different than those originally provided to RAND in 2007.
Do the errors in the original dataset invalidate the external benchmark analyses RAND described in the RAND report? No. The erroneous data were not used in the analyses reported in the RAND report. As noted in the report, the three external benchmark analyses that made use of violent crime suspect descriptions "used data only on black, Hispanic, and white suspects, since other racial groups had counts that were too small for statistical analysis" (RAND report, p. 17). Thus, when the proportions of black, Hispanic and white crime suspects were compared to the race distribution of pedestrians stopped on the street, both datasets included only those cases that had been coded by NYPD as black, white, or Hispanic. The "Other" race category was not included as part of the population on which proportions were calculated.
Why did RAND exclude the "Other" and "Asian" race categories from its analyses? Was it because of the errors in the data NYPD sent RAND? RAND's rationale for excluding the "Other" and "Asian" race categories resulted not from the citywide crime data provided by the NYPD, but because the number of pedestrian stops for these race groups was too small to conduct reliable statistical analyses. "Other" race was excluded from all external benchmark analyses, including those that made no use of crime suspect data or any other citywide crime data, such as the census benchmark analyses. This decision was made because of the low number of pedestrian stops associated with this race category and the difficulty of conducting reliable statistical analyses with such small numbers.
Does exclusion of some race categories from the benchmark analysis compromise the validity of the external benchmarks for whites, blacks and Hispanics? No. Using just the three largest race groups allows for very precise estimates of differences in the relative race distributions of pedestrian stops and comparison benchmark data. Eliminating small race categories simplifies the model and improves its fit with the data while focusing on comparisons that can be made with meaningful levels of precision. The principal shortcoming of our approach is that the results identify the representation of racial groups in the stop and benchmark data for all blacks, Hispanics and whites, rather than for the broader reference group of all race groups. Since blacks, Hispanics and whites make up 97% of the stop data, and "Other" and "Asian" race groups are too small in the stop data to estimate benchmark comparisons with precision, this narrower frame of reference does not compromise our analyses.
Were there other errors in the data NYPD provided to RAND? The only errors that NYPD reported to RAND concerned the crime suspect description data for 2005 and 2006. In addition to the significant correction that has been made to the "Other" race category, NYPD made other minor updates to other race data. These other corrections result in changes to the race distribution of suspects RAND used in its analyses that are quite small (none larger than a change of 0.21%). As such, these updates would not meaningfully alter the findings described in the RAND report.
Should people rely on the external benchmarks reported in the RAND report to determine if NYPD was engaged in race-neutral pedestrian stops in 2006? No, external benchmarks are, as emphasized in the RAND report, "fraught with challenges" (RAND report, p. 19). Indeed, a central objective of the external benchmark comparisons offered in this report was to emphasize how poorly external benchmarks serve as a measure of race-neutral policing. As emphasized in the conclusions to this section of the RAND report: "Importantly, this chapter has shown that the conclusions from external benchmarking are highly sensitive to the choice of benchmark. In other words, the results of any analysis using external benchmarks may vary drastically depending on which benchmark is used" (RAND report, p. 19). Instead, the report goes on to present a novel "internal" benchmark approach that compares the behavior of individual officers to others with similar responsibilities. The report argues that the internal benchmarking method avoids many of the pitfalls highlighted in the use of external benchmarks.
The research described in this report was supported by the New York City Police Foundation and was conducted under the auspices of the Center on Quality Policing (CQP), part of the Safety and Justice Program within RAND Infrastructure, Safety, and Environment (ISE).
This publication is part of the RAND technical report series. RAND technical reports, products of RAND from 2003 to 2011, presented research findings on a topic limited in scope or intended for a narrow audience; discussions of the methodology employed in research; literature reviews, survey instruments, modeling exercises, guidelines for practitioners and research professionals, and supporting documentation; and preliminary findings. All RAND technical reports were subject to rigorous peer review to ensure high standards for research quality and objectivity.
This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.
RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.