Use of Geocoding and Surname Analysis to Estimate Race and Ethnicity

Published in: Health Services Research, v. 41, no. 4, pt. 1, Aug. 2006, p. 1482-1500

by Kevin Fiscella, Allen Fremont

Read More

Access further information on this document at

This article was published outside of RAND. The full text of the article can be found at the link above.

OBJECTIVE: To review two indirect methods, geocoding and surname analysis, for estimating race/ethnicity as a means for health plans to assess disparities in care. STUDY DESIGN: Review of published articles and unpublished data on the use of geocoding and surname analyses. PRINCIPAL FINDINGS: Few published studies have evaluated use of geocoding to estimate racial and ethnic characteristics of a patient population or to assess disparities in health care. Three of four studies showed similar estimates of the proportion of blacks and one showed nearly identical estimates of racial disparities, regardless of whether indirect or more direct measures (e.g., death certificate or CMS data) were used. However, accuracy depended on racial segregation levels in the population and region assessed and geocoding was unreliable for identifying Hispanics and Asians/Pacific Islanders. Similarly, several studies suggest surname analyses produces reasonable estimates of whether an enrollee is Hispanic or Asian/Pacific Islander and can identify disparities in care. However, accuracy depends on the concentrations of Asians or Hispanics in areas assessed. It is less accurate for women and more acculturated and higher SES persons due intermarriage, name changes, and adoption. Surname analysis is not accurate for identifying African Americans. Recent unpublished analyses suggest plans can successfully use a combined geocoding/surname analyses approach to identify disparities in care in most regions. Refinements based on Bayesian methods may make geocoding/surname analyses appropriate for use in areas where the accuracy is currently poor, but validation of these preliminary results is needed. CONCLUSIONS: Geocoding and surname analysis show promise for estimating racial/ethnic health plan composition of enrollees when direct data on major racial and ethnic groups are lacking. These data can be used to assess disparities in care, pending availability of self-reported race/ethnicity data.

This report is part of the RAND Corporation External publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

Our mission to help improve policy and decisionmaking through research and analysis is enabled through our core values of quality and objectivity and our unwavering commitment to the highest level of integrity and ethical behavior. To help ensure our research and analysis are rigorous, objective, and nonpartisan, we subject our research publications to a robust and exacting quality-assurance process; avoid both the appearance and reality of financial and other conflicts of interest through staff training, project screening, and a policy of mandatory disclosure; and pursue transparency in our research engagements through our commitment to the open publication of our research findings and recommendations, disclosure of the source of funding of published research, and policies to ensure intellectual independence. For more information, visit

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.