Equitable Data Analysis: Lessons from a COVID-19 Research Collaborative

commentary

Jul 27, 2021

Dieumeeci Ufitimana, (C) signs up to receive the COVID-19 vaccine at Bethel AME Church in St. Petersburg, Florida, July 23, 2021, photo by Octavio Jones/Reuters

Dieumeeci Ufitimana, (center) signs up to receive the COVID-19 vaccine at Bethel AME Church in St. Petersburg, Florida, July 23, 2021

Photo by Octavio Jones/Reuters

The health inequities exposed by COVID-19 underscored the importance of collecting race-stratified data to inform local policymakers. For the public health researchers trying to provide that, the pandemic also revealed some major pitfalls, especially about relying on open-source data. Information is almost never neutral: What gets collected, how it is analyzed, reported, contextualized, and used—all that reflects preexisting assumptions and biases.

All of these became factors when RAND, the Black Equity Coalition (BEC), and Surgo Ventures collaborated on a tool to report on COVID-19 vulnerability and disparities using publicly available data in Allegheny County, Pennsylvania. The goal was to help decisionmakers identify geographic areas and racial/ethnic populations most at risk of infection and complications from the novel coronavirus.

Equity in Data Collection and Reporting

Researchers can't measure demographic inequities without data on race and ethnicity that has been collected consistently and accurately. In early May 2020, as the BEC developed a data dashboard and identified racial disparities in COVID-19 outcomes, they had to scrape data from state data systems. They later advocated for the public availability of the Allegheny County Health Department dataset used for our tool. This early work to make county-level data disaggregated by race and geographic area available was critical to our collaborative's ability to draw granular, interactive insights and recommendations on vulnerability and equity in the county.

Researchers can't measure demographic inequities without data on race and ethnicity that has been collected consistently and accurately.

Share on Twitter

As a nation we are still far from universal reporting of COVID-19 testing and case rates by race/ethnicity. This reflects a more widespread problem of a lack of disaggregated data, which January 2021's Federal Executive Order On Advancing Racial Equity and Support for Underserved Communities Through the Federal Government aims to address for federal data sources.

Equity in Data Visualization

Early iterations of our tool describing racial disparities in COVID-19 testing and outcomes did not fully consider the equity implications of the visualizations we had designed. In collaboration with the BEC, RAND and Surgo learned about the best practices for visualizing data in ways that are sensitive to how overburdened and under-resourced communities are represented and that avoid unintentionally reinforcing stereotypes. We carefully considered color choices representing geographies on maps and racial groups in our charts (e.g., avoiding darker colors to represent Black populations) in the final tool. We also listed racial groups in alphabetical order within legends to avoid implying defaults or hierarchies (e.g., not showing white populations at the top).

Equity in Data Interpretation

Our initial COVID-19 vulnerability metrics for the county relied on historic data, such as geographic estimates of socioeconomic status from the U.S. Census, which can constrain the ability of researchers to identify other socio-demographic trends that can inform policy choices.

Mindful of this, our collaborative brought together those with data analysis and subject-matter expertise as well as lived experience to lend context to our work. This led us to combine vulnerability analyses with ongoing race-based and geographic analyses so we could track evolving disparities in testing and outcomes by race and geography.

Context is especially critical when it comes to interpreting data on racial disparities related to COVID-19.

Share on Twitter

Context is especially critical when it comes to interpreting data on racial disparities related to COVID-19. There is a tendency among journalists and even many researchers to attribute negative outcomes from an infection to individual factors such as comorbidities and health behaviors. And perhaps understandably so: Data are more widely available on final outcomes rather than upstream factors such as systemic racism and access to health care, high-quality jobs, safe housing, and grocery stores. As a consequence, many Americans do not acknowledge the role of these factors in health disparities.

Equity in Dissemination

We planned a digital narrative tool to present findings in an interactive way and offer interpretation of those findings for use by policymakers and residents alike. RAND and Surgo teams initially attempted to engage community representatives to inform dissemination but were unable to develop the connections we needed. Later, working with the BEC, we integrated a broader set of perspectives into how the findings were communicated in the tool. We then shared the tool through news media and the networks of all of our organizations to draw users to the tool's website.

While dissemination is one of the last phases of the research process, equitable research and analysis is ideally designed with it in mind from the beginning. Findings and recommendations ought to be usable for communities implicated in the research and those who have decisionmaking roles that impact them.

Looking Ahead

This tool is focused on disparities in COVID-19 testing, cases, and deaths. Current vaccination efforts, however, make it clear that challenging equity concerns have not been solved. The BEC released a report (PDF) in April on vaccine equity in Allegheny County describing inequitable vaccine distribution, limited reporting of data on race, as well as other data infrastructure issues and recommendations.

The proliferation of local COVID-19 dashboards shows that open data is on the rise. But making data available does not ensure that the information will advance the cause of equity. Our journey speaks to how research can fall short of considering equity in a comprehensive way, as well as ways that collaboratives like ours build equity into the whole research lifecycle.


Linnea Warren May is an associate policy researcher at the RAND Corporation. She has worked on projects related to building a culture of health, city resilience planning, integrating data to assess well-being and social equity, patient experience with health care, disaster recovery, and military mental health and health systems.

Jason Beery is the director of applied research at UrbanKind Institute and a member of the Black Equity Coalition Data Committee. Trained as a geographer, Jason examines spatial difference from a holistic, intersectional perspective.

Tiffany Gary-Webb is the associate dean for diversity, equity, and inclusion and a tenured associate professor in the departments of epidemiology and behavioral and community health sciences at the University of Pittsburgh's Graduate School of Public Health. She is a member of the Black Equity Coalition Data Committee.

Evan Peet is an economist at RAND and a Pardee RAND Graduate School faculty member. His research focuses on human capital, population health, labor, and environment.

Jared Kohler is a systems engineer and data visualization specialist at the Carnegie Mellon University CREATE Lab and a member of the Black Equity Coalition Data Committee. His work focuses on equity in health, employment, and the environment.