Using Indirect Estimates Based on Name and Census Tract to Improve the Efficiency of Sampling Matched Ethnic Couples from Marriage License Data
Published In: Public Opinion Quarterly, v. 77, no. 1, Spring 2013, p. 375-384
Posted on RAND.org on January 01, 2013
For many sampling applications where study goals require oversampling by race/ethnicity, self-reported race/ethnicity is unavailable. We describe a new method that allows oversampling on the basis of indirectly estimated race/ethnicity when name and address information are available. We adapt a Bayesian method for imputing self-reported race/ethnicity from surname and residential address information for use with marriage license application data in order to improve the efficiency of sampling for a study of newly married low-income Hispanic couples. Marriage license data contain the name and address of both parties, but not race/ethnicity. We used the indirect method to generate predicted probabilities that the couple in question falls into each possible combination of race/ethnicity. These probabilities were used to oversample couples of interest to generate a more efficient (weighted) probability sample than was otherwise possible. Based on Census data, we expected that half of our screened sample would be dually Hispanic; with our method, we obtained a sample for screening that was 92-percent dually Hispanic. This method nearly halved the screening needed yet obtained a probability sample of the target population with a small design effect, substantially improving the net efficiency. The potential gains of this approach are greater for rarer populations, and the methods are applicable to other sampling settings where the characteristics of multiple individuals are relevant.