Indirect Estimation of Race/Ethnicity for Survey Respondents Who Do Not Report Race/Ethnicity
Published in: Medical Care, Volume 57, Issue 5, pages e28-e33 (May 2019). doi: 10.1097/MLR.0000000000001011
Researchers are increasingly interested in measuring race/ethnicity, but some survey respondents skip race/ethnicity items.
The main objectives of this study were to investigate the extent to which racial/ethnic groups differ in skipping race/ethnicity survey items, the degree to which this reflects reluctance to disclose race/ethnicity, and the utility of imputing missing race/ethnicity.
We applied a previously developed method for imputing race/ethnicity from administrative data (Medicare Bayesian Improved Surname and Geocoding 2.0) to data from a national survey where race/ethnicity was usually self-reported, but was sometimes missing. A linear mixed-effects regression model predicted the probability of self-reporting race/ethnicity from imputed racial/ethnic probabilities.
In total, 508,497 Medicare beneficiaries responding to the 2013-2014 Medicare Consumer Assessment of Healthcare Providers and Systems surveys were included in this study.
Self-reported race/ethnicity and estimated racial/ethnic probabilities.
Black beneficiaries were most likely to not self-report their race/ethnicity (6.6%), followed by Hispanic (4.7%) and Asian/Pacific Islander (4.7%) beneficiaries. Non-Hispanic whites were the least likely to skip these items (3.2%). The 3.7% overall rate of missingness is similar to adjacent demographic items. General patterns of item missingness rather than a specific reluctance to disclose race/ethnicity appears to explain the elevated rate of missing race/ethnicity among Asian/Pacific Islander and Hispanic beneficiaries and most but not all among Black beneficiaries. Adding imputed cases to the data set did not substantially alter the estimated overall racial/ethnic distribution, but it did modestly increase sample size and statistical power.
It may be worthwhile to impute race/ethnicity when this information is unavailable in survey data sets due to item nonresponse, especially when missingness is high.