Data Alert for RAND HRS, Version J (June – 2010)
We released Version J of the RAND HRS dataset at the beginning of March-2010. Since then, with the help of other users, we have discovered some inconsistencies in the data, some of which will be addressed in future releases, while others have been corrected.
The RAND HRS data files, including the codebook, will be updated on the HRS web site in the data download area. However, for Correction #3 (described below), we have created a series of fix files, which can be used to update the affected variables on Version J of the RAND HRS dataset that you may have already downloaded. These fix files are smaller to download, but you will need to merge the changes into your existing file. Sample programs for such a merge are provided in SAS, Stata, and SPSS.
We are sorry for any inconvenience this may have caused. Please let us know if you have any questions by emailing RANDHRSHelp@rand.org.
The issues/corrections addressed here include:
- Different sample sizes in the RAND HRS and Tracker 2008 datasets
- Attaching value labels to Stata SE version of the RAND HRS
- Corrections to “Whether and age when started to receive Social Security” variables
Different sample sizes in the RAND HRS and Tracker 2008 datasets
The RAND HRS dataset has a total sample size of 30,548, while the Tracker 2008 dataset (V1.0, December 2009) has a total sample size of 31,022. The table below provides a summary of the differences:
IN_RANDHRS IN_TRACKER Frequency
---------------------------------------
No Yes 475
No Yes 1
Yes No 2
Yes Yes 30,546
The first row (N = 475) represents cases in the Tracker 2008 dataset, but not in the RAND HRS dataset. These are people where either a core interview was never obtained, or an exit interview (i.e., a proxy interview on the deceased) was conducted. These are more likely to be spouses who never responded, rather than being exit interviews, as only about 75 of the 475 ever have an exit interview.
The second row (N = 1) also represents cases in the Tracker 2008 dataset, but not in the RAND HRS dataset. This particular individual, however, is an “HRS-AHEAD Overlap” case (see the RAND HRS Documentation for an explanation of the RAOVRLAP variable). Specifically, this person (HHIDPN: 204940020) was married to 204940010 in 1992, and the household was turned over to AHEAD. However, 204940020 never responded, and was left on Tracker as an HRS case (020582020). In other words, on the RAND HRS dataset, this person can be identified by their AHEAD ID (204940020), whereas on the Tracker dataset, they are identified by their HRS ID (020582020).
The third row (N = 2) represents cases in the RAND HRS dataset, but not in the Tracker 2008 dataset. One of the individuals (HHIDPN: 204940020) is described in the paragraph above. Again, this is an “HRS-AHEAD Overlap” case, which appears in the RAND HRS dataset as 204940020, but in the Tracker 2008 dataset as 020582020. The other person (HHIDPN: 22965041) looks like it should perhaps be dropped based on the following statement in the Tracker 2008 documentation:
“In the course of reviewing data, it was discovered that one line, HHID 022965 and PN 041, was indeed never a qualified respondent. Also is has been determined that the interview obtained from HHID 022965 and PN 040 cannot be verified and should not have been taken. The line for PN 041 has been removed from the tracker file and the wave specific variables have been recoded for PN 040 to reflect non-response.”
The 22965041 case does appear on the RAND-Enhanced Fat File for 2002 and the RAND HRS dataset, and will thus be removed in future releases. Moreover, according to HRS, the spouse’s (HHIDPN: 22965040) interview for 2008 should never have been taken. This case does appear on the RAND-Enhanced Fat File for 2008 and the RAND HRS dataset, and will thus be removed in future releases.
Attaching value labels to Stata SE version of the RAND HRS
The variable labels (e.g., RAGENDER, 1 = “Male”, 2 = “Female”) were inadvertently left off of the Stata SE version of the RAND HRS dataset. Users who downloaded this version of the dataset should go to the HRS data download page, and download the updated version of the Stata SE zip package (randJstataSE.zip).
Alternatively these labels can be added using the following Stata commands:
label define gender 1 "1. Male" 2 "2. Female"
label value ragender gender
and re-saving your file.
Corrections to “Whether and age when started to receive Social Security” variables
In the process of updating the following variables, we inadvertently did not incorporate the relevant data from HRS 2008:
Respondent variables:
| RASSRECV | whether Respondent receives Social Security |
| RASSAGEM | age in months when Respondent first received SS income |
| RASSAGEB | age in years when Respondent first received SS income |
Spouse variables:
| SASSRECV | whether Spouse receives Social Security |
| SASSAGEM | age in months when Spouse first received SS income |
| SASSAGEB | age in years when Spouse first received SS income |
The problem affected RASSRECV for 918 cases, RASSAGEM and RASSAGEB for 934 cases, SASSRECV for 723 cases, and SASSAGEM and SASSAGEB for 730 cases.
The following tables list the relevant descriptive statistics for the affected variables both before and after the corrections:
Before Corrections:
Variable N Mean Std Dev Minimum Maximum RASSAGEM 13218 733.276 72.338 109.000 1081.000 SASSAGEM 9406 738.837 65.041 229.000 1081.000 RASSAGEB 13218 61.120 6.028 9.100 90.100 SASSAGEB 9406 61.584 5.420 19.100 90.100 RASSRECV 30548 0.705 0.456 0.000 1.000 SASSRECV 22841 0.631 0.482 0.000 1.000 Value-------------------------|RASSRECV 0.no | 9023 1.yes | 21525 Value-------------------------|SASSRECV .U=Unmar | 7707 0.no | 8424 1.yes | 14417
After Corrections:
Variable N Mean Std Dev Minimum Maximum RASSAGEM 13923 734.103 71.844 109.000 1081.000 SASSAGEM 9874 739.418 64.646 229.000 1081.000 RASSAGEB 13923 61.189 5.987 9.100 90.100 SASSAGEB 9874 61.632 5.387 19.100 90.100 RASSRECV 30548 0.735 0.442 0.000 1.000 SASSRECV 22841 0.663 0.473 0.000 1.000 Value-------------------------|RASSRECV 0.no | 8105 1.yes | 22443 Value-------------------------|SASSRECV .U=Unmar | 7707 0.no | 7701 1.yes | 15140
To update these variables, users may choose to either download the newest version of the RAND HRS dataset from the HRS data download page, or use the fix files we have provided, which are described below. If you re-download the entire file, you do not need to use the fix files.
Fix files for download
These data are encrypted. To unencrypt, please use the passphrase provided in the “unlock_cd.txt” file found on the HRS data download page under RAND Contributions. Note that you will need WinZip V10 or higher to unzip the file. You can download WinZip from www.winzip.com.
There are separate zip files for SAS, SPSS, and Stata SE or Intercooled called rndfix_j1SAS.zip, rndfix_j1SPSS.zip, and rndfix_j1STATASE.zip or rndfix_j1STATAI.zip, respectively. Included in this zip file are:
- this document (Data Alert VerJ June2010.doc)
- means and frequencies showing values before and after the corrections (rndfix_j1tables.txt)
- a sample program for updating your existing rndhrs_j file. You may need to adjust the code for the locations of files on your system. It will save your current rndhrs_j file as rndhrs_j1, then will update with the corrected variables.
- a data file with the corrected variables. These files are encrypted.
The following lists the details for each zip file:
- rndfix_j1SAS.zip: rndfix_j1.sas7bdat and SAS sample program
- rndfix_j1STATASE.zip: rndfix_j1.dta and Stata-SE sample program
- rndfix_j1STATAI.zip: rndfix_j1.dta, includes Stata-I sample programs (rndfix1j1.do, rndfix2j1.do, rndfix3j1.do, rndfix4j1.do, rndfix5j1.do, rndfix6j1.do, rndfix7j1.do, rndfix8j1.do, rndfix9j1.do). Note that all the *.do files use the same rndfix_j1.dta file to apply the corrections.
- rndfix_j1SPSS.zip: rndfix_j1.sav and SPSS sample program
