Imputation of Race and Ethnicity in Health Insurance Marketplace Enrollment Data, 2015–2022 Open Enrollment Periods

by Melony E. Sorbero, Roald Euller, Aaron Kofner, Marc N. Elliott

Download eBook for Free

FormatFile SizeNotes
PDF file 0.7 MB

Use Adobe Acrobat Reader version 10 or higher for the best experience.

Research Questions

  1. How accurate is the modified BIFSG method in correcting for gaps in race and ethnicity information for individuals enrolled through Health Insurance Marketplace?
  2. What are the potential uses for imputed race and ethnicity of individuals enrolling through the Health Insurance Marketplace?

Information on the race and ethnicity of individuals enrolled through the Health Insurance Marketplace is critical for assessing past enrollment efforts and determining whether outreach campaigns should be modified or tailored moving forward. However, approximately one-third of insurance applicants do not complete the race and Hispanic ethnicity questions on the Marketplace application. When self-reported race and ethnicity information is missing, other information about an individual can be used to infer race and ethnicity, such as surnames, first names, and addresses, with each characteristic contributing meaningfully to the identification of six mutually exclusive racial and ethnic groups: American Indian (AI)/Alaskan Native (AN); Asian American, Native Hawaiian, and Pacific Islander (AANHPI); Black; Hispanic; Multiracial; and White. Surnames are particularly useful for distinguishing people who identify as Hispanic and AANHPI from other racial and ethnic groups. Geocoded address information is particularly useful in distinguishing Black and White individuals who frequently reside in racially segregated neighborhoods.

This report presents the results of imputing race and ethnicity for Marketplace enrollees from 2015 through 2022 using the modified Bayesian Improved First Name Surname and Geocoding (BIFSG) method, developed by the RAND Corporation, which uses surnames, first names, and residential addresses to indirectly estimate race and ethnicity.

Key Findings

Race and ethnicity imputations using the modified BIFSG are highly accurate at the population level

  • The predictive accuracy of the modified BIFSG is high overall among enrollees who self-reported race and ethnicity, particularly for AANHPI, Black, Hispanic, and White enrollees, but was lower for AI/AN and Multiracial enrollees.
  • While the accuracy was high for all age groups, it was lower for children and young adults than older enrollees. Accuracy varied by census division.
  • The imputed race and ethnicity for nonreporters suggested that enrollees who self-reported race and ethnicity were more likely to be AANHPI or White than enrollees who did not report race and ethnicity and less likely to be Black or Hispanic enrollees.

Information on the race and ethnicity of individuals enrolling through the Health Insurance Marketplace is critical for assessing past enrollment efforts

  • Officials could compare Marketplace enrollment by race and ethnicity with estimates of expected enrollment to identify subgroups for which enrollment is lagging and determine whether outreach campaigns should be modified or tailored moving forward to better target these populations.
  • Officials could also use race and ethnicity probabilities to understand whether the plan selection patterns vary by race, ethnicity, choice of plan metal level, and channel used to purchase Marketplace plans and also the extent to which brokerage assistance or navigators supported plan selections.
  • Officials could also explore whether insurance plan costs to enrollees vary by race and ethnicity.


  • The modified BIFSG-imputed race and ethnicity probabilities can be used when self-reported race and ethnicity is missing for individuals obtaining health insurance through the Health Insurance Marketplace ( to enable HHS to better assess past enrollment efforts and variations in plan selection.
  • The modified BIFSG-imputed race and ethnicity should not be used to make inferences about AI/AN or Multiracial enrollees in general.
  • Although the modified BIFSG performed well, there is room for improvement, particularly in the identification of enrollees who are AI/AN or Multiracial.
  • There are potential opportunities to further enhance the algorithm. There are additional variables in the Multidimensional Insurance Data Analytic System data that could be incorporated into the modified BIFSG. By including enrollee age, estimates could be adjusted for generational changes in the distribution of race and ethnicity.

Table of Contents

  • Chapter One


  • Chapter Two


  • Chapter Three


  • Chapter Four


  • Appendix A

    States Participating in the Federally Facilitated Marketplaces and State-Based Marketplaces Using Federal Platform

  • Appendix B

    2021 COVID-19 Special Enrollment Period

  • Appendix C

    Inconsistencies in Self-Reported Race and Ethnicity Across Years

  • Appendix D

    Years of Enrollment by Race and Ethnicity

  • Appendix E

    Self-Reported Race and Ethnicity Prior to Implementing Modified BIFSG Imputation, by Year

  • Appendix F

    Calibrated and Uncalibrated Imputation Results

  • Appendix G

    U.S. Census Divisions

Research conducted by

This research was funded by the Office of the Assistant Secretary for Planning and Evaluation and carried out within the Payment, Cost, and Coverage Program in RAND Health Care.

This report is part of the RAND Corporation Research report series. RAND reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND reports undergo rigorous peer review to ensure high standards for research quality and objectivity.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.