The Urge to Merge
Linking Vital Statistics Records and Medicaid Claims
Published in: Medical Care, v. 32, no. 10, Oct. 1994, p. 1004-1018
Posted on RAND.org on October 01, 1994
This paper describes a procedure used to link Medicaid claims data to California vital statistics records for very low birthweight infants. The linkage involved about 53,000 infants born from 1980 to 1987 and 1.46 million claims for delivery/birth-related hospital admissions during the same period. Because the two data files did not share a unique identifier, record linkage required combining evidence across several linking variables: delivery hospital, delivery/birth date or hospitalization period, names, mother's age, and zip code. To combine the various pieces of evidence, we used record linkage theory to compute scores that measure the likelihood of a match, i.e., that two records correspond to the same delivery. These scores appropriately weight the various pieces of evidence for or against a match. Implementation required dealing with large amounts of missing data in one of the files, errors and variations in reported names, and the need to minimize the number of incorrect links. The approach applies to a wide range of linkage problems. The ability to combine existing datasets to form new datasets containing analysis variables from each facilitates analyses that would otherwise be impossible, or prohibitively expensive.