Objective
To assess the feasibility and implications of imputing race and ethnicity for quality and utilization measurement in Medicaid.
Data Sources and Study Setting
2017 Oregon Medicaid claims from the Oregon Health Authority and electronic health records (EHR) from OCHIN, a clinical data research network, were used.
Study Design
We cross-sectionally assessed Hispanic-White, Black-White, and Asian-White disparities in 22 quality and utilization measures, comparing self-reported race and ethnicity to imputed values from the Bayesian Improved Surname Geocoding (BISG) algorithm.
Data Collection
Race and ethnicity were obtained from self-reported data and imputed using BISG.
Principal Findings
42.5%/4.9% of claims/EHR were missing self-reported data; BISG estimates were available for >99% of each and had good concordance (0.87–0.95) with Asian, Black, Hispanic, and White self-report. All estimated racial and ethnic disparities were statistically similar in self-reported and imputed EHR-based measures. However, within claims, BISG estimates and incomplete self-reported data yielded substantially different disparities in almost half of the measures, with BISG-based Black-White disparities generally larger than self-reported race and ethnicity data.
Conclusions
BISG imputation methods are feasible for Medicaid claims data and reduced missingness to <1%. Disparities may be larger than what is estimated using self-reported data with high rates of missingness.