Analysis of Racial Disparities in the New York Police Department's Stop, Question, and Frisk Practices
Nov 9, 2007
RAND analyzed raw data from 2006 NYPD pedestrian stops to assess whether there is racial bias in police officers' decisions to stop pedestrians, analyzing the data first in terms of external benchmarks, then in terms of internal benchmarks, and finally in terms of whether white and nonwhite suspect have different stop outcomes. The study found racial differences, but they are much smaller than the raw data indicate; the study makes a series of recommendations for improving police-pedestrian interactions.
In 2006, the New York City Police Department (NYPD) stopped a half-million pedestrians because of suspected criminal involvement, recording these stops on stop, question, and frisk (SQF) reports on the department's UF250 forms. Raw statistics for these encounters suggest large racial disparities—89 percent of the stops involved nonwhites, and of that 89 percent, 53 percent were black, 29 percent were Hispanic, and 3 percent were Asian, while the remaining 4 percent were not race-identified. In addition, once stopped, 45 percent of black and Hispanic suspects were frisked compared with 29 percent of white suspects; yet, when frisked, white suspects were 70 percent likelier than black suspects to have had a weapon on them. While the data document racial disparities, do they also indicate racial bias?
Answering that question requires going beyond the raw statistics and carefully analyzing the data, which the NYPD asked the RAND Corporation's Center on Quality Policing to do. RAND's analyses addressed three questions, summarized below.
Answering this question with the data requires the use of external benchmarks. But constructing valid external benchmarks is difficult, because it involves assessing the racial composition of those participating in criminal activity and the racial composition of those exposed to the patrolling officers. RAND completed analyses using three benchmarks developed to date; as shown in the table, the three benchmarks yield very different results with the same data. And, as the last column shows, all the benchmarks have reliability issues, arguing that any analyses based on them are questionable.
|External Benchmark||Results||Assessment of Benchmark|
|Residential census data||Blacks stopped at rate 50% greater than representation in census; Hispanics at rate equal to representation||Most widely used but least reliable—does not account for different rates of crime participation by race, for differential exposure to the police, or for the possibility of visitors from other communities|
|Racial distribution of arrestees||Blacks stopped at nearly same rate as representation among arrestees; Hispanics at rate 6% higher||Prominently used—but may not accurately reflect types of suspicious activity officers might observe; arrests can occur far from where the crime occurred; and the benchmark is not independent of any officer biases since police make both the arrests and stops|
|Racial distribution of individuals identified in crime-suspect descriptions||Blacks stopped at rate 20–30% lower than representation in crime-suspect descriptions; Hispanics at rate 5–10% higher||More promising since it is independent of the police, but it requires that suspects, regardless of race, are equally exposed to officers|
Given the limitations of external benchmarking, RAND conducted an internal benchmarking analysis that compared each individual officer's stopping patterns with a benchmark constructed from stops in similar circumstances made by other officers.
That analysis found that 15 officers appear to have stopped substantially more black suspects (5 officers) and Hispanic suspects (10 officers) than other officers when patrolling the same areas, at the same times, and with the same assignments. Conversely, 14 officers appear to have stopped substantially fewer black suspects (9 officers) and Hispanic suspects (5 officers) than expected. Of the 15 officers who overstopped blacks and Hispanics, 6 are from the Queens South borough.
The analysis flagged 0.5 percent of the 2,756 NYPD officers most active in pedestrian-stop activity, who accounted for 54 percent of the stops. The remaining stops were made by another 15,855 officers who conducted too few stops to enable accurate internal benchmarks to be constructed. While the data suggest that only a small fraction of the officers most active in pedestrian stops have problematic patterns, those patterns should be detected and investigated routinely and more thoroughly. Also, the stops made by the 15,855 officers we could not analyze may still be of concern.
If there is race bias in the behavior of those 15,000-plus officers, we may be able to find it by looking at what happens after stops—frisks, searches, uses of force, and arrests. As noted in the opening paragraph, the citywide aggregate raw data showed large differences between racial groups in such after-stop outcomes.
But relying on the raw data is problematic, because the data do not reflect nonracial differences that may account for some of the racial differences. RAND started with the raw data and then adjusted them so they reflect similarly situated circumstances post-stop. This meant matching white and nonwhite pedestrians on more than 25 stop features, including crime suspected, time of day, and location.
After matching on the circumstances of the stops, researchers still found some racial differences in frisk, search, use-of-force, and arrest rates, but the magnitude of the disparities was considerably less than the raw statistics indicate. For example, according to the raw statistics, white pedestrians were frisked in 29 percent of stops, but those white pedestrians stopped in circumstances similar to black suspects were frisked 42 percent of the time, slightly less than the rate for black suspects (45 percent). While most racial differences in post-stop outcomes were of this size, the gaps for some boroughs warrant a closer review. For example, Staten Island stands out with several large racial gaps in frisk, search, and use-of-force rates.
The raw numbers on recovery rates for contraband indicated that frisked or searched white suspects were much likelier to have contraband than black suspects. But after matching the stops for several important factors, the disparity is sharply reduced: The recovery rate for frisked or searched white suspects stopped in circumstances similar to those of black suspects was only slightly greater than it was for black suspects (6.4 percent versus 5.7 percent). When considering only recovery rates of weapons, researchers found no differences by race.
The raw statistics, while easy to compute, can distort the magnitude of racially biased policing. Moreover, some attempts at analysis can do the same. The most common forms of analysis rely on external benchmarks, which, as shown, are not reliable, yielding different results based on the same raw data.
Using more precise benchmarks does not eliminate the observed racial disparities but does indicate that the disparities are much smaller than the raw statistics would suggest. For example, some nonracial factors—such as police policies and practices that can legitimately differ by time, place, and reason for the stop—explain much of the difference between the frisk rate of black suspects (45 percent) and the frisk rate of white suspects (29 percent) that appears in the raw data.
Of course, any racial disparities in the data are cause for concern. However, accurately measuring the magnitude of the problem can help police management, elected officials, and community members decide between the need for incremental changes in policy, reporting, and oversight and sweeping organizational changes. The results do not absolve the NYPD of the need to monitor the issue, but they imply that a large-scale restructuring of NYPD SQF policies and procedures is unwarranted.
Based on these conclusions, the study makes six recommendations to improve interactions between police and pedestrians during stops and to improve the accuracy of data collected during pedestrian stops: (1) the NYPD should review the boroughs with the largest racial disparities in stop outcomes; (2) the NYPD should identify, flag, and investigate officers with unusual stop patterns; (3) all officers should explain to pedestrians why they are being stopped; (4) new officers should be fully conversant with SQF documentation; (5) the UF250 should be revised to capture data on use of force; and (6) the NYPD should consider modifying the audits of the UF250.