Using the Census Bureau's Surname List to Improve Estimates of Race/Ethnicity and Associated Disparities
Published In: Health Services and Outcomes Research Methodology, v. 9, no. 2, June 2009, p. 69-83
Posted on RAND.org on December 31, 2008
Commercial health plans need member racial/ethnic information to address disparities, but often lack it. We incorporate the U.S. Census Bureaugass latest surname list into a previous Bayesian method that integrates surname and geocoded information to better impute self-reported race/ethnicity. We validate this approach with data from 1,921,133 enrollees of a national health plan. Overall, the new approach correlated highly with self-reported race-ethnicity (0.76), which is 19% more efficient than its predecessor (and 41% and 108% more efficient than single-source surname and address methods, respectively, P < 0.05 for all). The new approach has an overall concordance statistic (area under the Receiver Operating Curve or ROC) of 0.93. The largest improvements were in areas where prior performance was weakest (for Blacks and Asians). The new Census surname list accounts for about three-fourths of the variance explained in the new estimates. Imputing Native American and multiracial identities from surname and residence remains challenging.