RAND HRS data file:
- I got an error message when I tried to open RAND HRS in SAS. System says: "ERROR: File cannot be opened or format not found".
- How do I merge RAND HRS data products with other HRS data products?
- How do I read a Stata-SE file using Stata Intercooled?
- How are spouses included in the RAND HRS?
- How can I tell if a respondent has died? And where can I find the date of death?
- Which interview dates and age variables should be used for each wave?
- How do I use weights in the RAND HRS Data?
- I found some households whose members (both respondent and spouse) were less than 50 years old. How did they get into the sample?
- How do I construct a new data set that is at the household level, i.e, each household has one observation/record?
- I have found some cases with inconsistency between R1AGEY_E and S1AGEY_E within the same household. What is the reason for this?
- How do I get the sub-sample for HRS only? Is there a specific code that identified the people born between 1931-1941?
RAND-enhanced fat file:
- Why don't the frequency counts on the fat files match those in the HRS codebooks for household level variables?
- I have successfully downloaded the 1998 RAND HRS fat file data and tried to look at the data in SPSS. However, I am unable to analyze the data — the prompt I receive is that the system is not responding and it seems like the computer is locked.
RAND HRS Data File
I got an error message when I tried to open RAND HRS in SAS. System says: "ERROR: File cannot be opened or format not found".
Use sasfmts.sas7bdat to create a formats catalog (formats.sas7bcat). Assign library as the folder where the RAND HRS data and sasfmts.sas7bdat files are, and submit SAS code. LIBNAME library "[name of RAND HRS data folder]";
PROC FORMAT LIBRARY=library CNTLIN=library.sasfmts;
If your library assignment is not “library”, then SAS will require that you search for the format catalogs in a different order. By default, WORK.FORMATS and LIBRARY.FORMATS catalogs are always searched first. You will need to change this default option. For example, suppose you assign the library where the formats file is located as “USERLIB”. To properly use the formats file, use the FMTSEARCH= system option in SAS immediately following the PROC FORMAT statement:
OPTIONS FMTSEARCH=(USERLIB WORK LIBRARY);
This will force SAS to correctly search in your defined “USERLIB” library first.
To merge any RAND product, you can use HHIDPN but to merge to a raw HRS data product you have a couple of options.
You can merge on HHID and PN or you can create RAHHIDPN (a character version of HHIDPN). If you want to create RAHHIDPN on the raw HRS data, here is the code for SAS, STATA and SPSS.
RAHHIDPN = HHID || PN;
genstr9 rahhidpn=hhid+ pn;
String RAHHIDPN (A9).
–compute RAHHIDPN = concat(HHID,PN).
You can read Stata Special Edition(SE) files such as the longitudinal RAND HRS data file or the RAND-enhanced Fat Files with Stata Intercooled by selecting variables on the use command, so long as the total number of variables does not exceed 2047.
use rahhidpn r1iwstat r2iwstat using "rndhrs_h.dta"
would select the respondent ID and 2 variables from the RAND HRS Ver H File.
The HRS samples at the household level. In a couple household, one or both of the couple is/are age-eligible for the study, but in either case BOTH individuals in a couple are given an interview and treated as respondents. So, the number of respondents in the RAND HRS is the same as in the core HRS respondent-level files and includes all HRS age-eligibles AND any non-age- eligible spouses.
For example, if Tom and Judy are a couple and both agree to be interviewed in 1998, you will see two records, one for Tom and one for Judy in both the RAND HRS and the 1998 HRS core data. On the RAND HRS data, you will see whether Judy has ever smoked as R4SMOKEV on Judy's record, and as S4SMOKEV on Tom's record. Moreover, you will see whether Tom has ever smoked as R4SMOKEV on Tom's record and as S4SMOKEV on Judy's. We add the spouse variables from the spouse report as a convenience.
If you are only interested in individuals without regard to spouses, you can simply ignore the Sw___ variables, and just use the Rw___ ones.
RwIWSTAT indicates the response and mortality status of the respondent at each wave. Respondents are identified by code 1, non-respondents by codes 4-7 and 9. Non-respondents who died between the last interview and the current one are assigned a 5 in RwIWSTAT, while those who died before the previous interviewed are assigned a 6.
Non-response code 4 means that the respondent is alive so far as we know but did not respond. A code of 7 means that the respondent has asked to be dropped from the sample, but was alive the last time this was observed. A code of 9 means that we don't know if the individual is alive or not.
Mortality status is taken from the Tracker file. Known alive and presumed alive are both treated as indication that the individual is living.
If the last available wave is based on Early Release data, the Tracker file may not yet indicate whether an individual is alive or not. If the Tracker does not include a mortality flag for the early release wave, and exit interview data for the interview year are available, RwIWSTAT will flag those with an exit interview as deceased.
The death dates are taken directly from the Tracker file RANYEAR, RANMONTH and RANDATE are the National Death Index (NDI) dates. RAIYEAR, RAIMONTH, and RAIDATE are death dates ascertained by HRS and populated by either the exit interview or spouse reported year of death information. RADYEAR, RADMONTH and RADDATE are the combination of the NDI dates and the exit interview/spouse reported dates. If both are present, we use the NDI date. RADDATE and RANDATE are the SAS date format.
The public use HRS data provide two interview dates for the early waves, a beginning interview date and an end interview date. In most cases the two dates are the same. On the RAND HRS data file there are three versions of interview dates and age variables.
The RwIWBEG interview data and RwAGEY_B respondent age variables reflect the beginning interview dates, and the RwIWEND and RwAGEY_E variables are based on the end interview date. The RwIWMID and RwAGEY_M variables are derived as the midpoint between the beginning and ending interview dates.
For most purposes it is best to use the variables based on the end interview date, that is, RwIWEND and RwAGEY_E. For most interviews that have two dates, the interview was postponed just after starting. So most of the interview was administered at the end date.
The weights included in the RANDHRS dataset are described in the RAND HRS Data Documentation under the following sections:
- Sampling Weight
- Household Analysis Weight
- Person-Level Analysis Weight
The weights you use, of course, are going to be driven by the types of analyses you are doing. Though statistical advice is beyond the scope of the help we can provide, we can verify that the sampling weights are only available for HRS (1992) on the RAND HRS file, and that the respondent and HH weights (RwWTRESP and RwWTHH) are taken directly from the HRS-provided weights on the Tracker file.
The HRS weight document (PDF) explains how the respondent and HH level weights are created and may help you decide which weights are appropriate to your analysis. More information on HRS weights can be found on the HRS website, under Documentation, Weights.
There are also some resources available through the HRS website that describe how one would begin performing analyses in various statistical packages. An HRS User Guide, Getting Started with the Health and Retirement Study (PDF), Chapter 8 shows an example of using weights in Stata.
There are just a couple of additional issues to be aware of:
- If you plan to use the data longitudinally you should also be aware that before 1998, non-age-eligible spouses in the HRS and AHEAD cohorts are given zero weights but in 1998 and beyond these spouses are given weights if they were born in years corresponding to other sample cohorts.
- Respondents living in nursing homes are given zero weight in the individual and household weights provided in the RAND HRS RwWTHH and RwWTRESP. There are also weights recently developed by HRS that provide non-zero weight to institutionalized individuals, that is, those living in nursing homes. These weights, provided for waves 5 and 6 in RwWTR_NH, are zero for those not living in a nursing home.
We do not do any additional processing of the weight variables; we simply copy them onto the RAND HRS files for your convenience. The data use guides published by the HRS folks may also be useful.
I found some households whose members (both respondent and spouse) were less than 50 years old. How did they get into the sample?
Age-Eligible Householdif a household with at least ONE age-eligible respondent. Each HRS household is supposed to have at least one age-eligible respondent, but there are a few exceptions.
Generally we know that we can have some members in the sample because they were at some earlier point married to age-eligible persons. Similarly, others are in the sample because they are married to people who were previously married to people who were age-eligible. Note: Age-ineligible individuals should have zero respondent weights.
How do I construct a new data set that is at the household level, that is, each household has one observation/record?
If the proposed household structure is for one wave, it can be constructed with minimum effort, because it's a snapshot of one wave. To do this, select one case from each HwHHID, e.g. first or last case.
However, creating the household structure for a longitudinal file is more complicated. Events such as divorce and death affect the household composition. Households may split, marry, remarry, reunite or become widowed. You will have to consider how to handle all of these transitions.
Even if a couple household remains the same across interviews, the person responding may change. For example, suppose Jack and Jill are married. Jack responds in wave one, Jill responds in wave two, Jill dies, Jack remarries Sue and Sue responds in wave three. Any longitudinal treatment of households should take all these situations into consideration. If any of your measures are respondent-level rather than household level, you will want to be careful to assign information to the right individual.
Here are some variables that can help. HwHHID identifies the wave specific household. It stays the same from one wave to the next if the household remains the same, if one of a couple dies, and if a single person marries. HwCPL indicates if the household is a couple household. The in-wave variables (INWw) indicate whether an individual responded in a given wave. SwIWSTAT gives the response and mortality status of the current wave's spouse at each wave.
I have found some cases with inconsistency between R1AGEY_E and S1AGEY_E within the same household. What is the reason for this?
The spouse age is derived using the date of the respondent's interview. Sometimes the respondent and spouse have different interview dates, as in the cases you found, and the spouse's birthday is in between.
How do get the sub-sample for HRS only? Is there a specific code that identified the people born between 1931-1941?
You can use the "cohort" variable to subset the HRS cases. The values of the HACOHORT variable are as follows:
Note that the HRS/Ahead overlap cases are all considered Ahead although one in each overlap couple is HRS eligible. See RAOVRLAP in codebook for more information.
You may also want to select on birth year to limit your sample to age-eligible HRS entry cohort members. (Also see the RAHRSAMP variable.) If you don't select on birth year, and use weights, you should be aware that age-ineligible spouses have positive weights from 1998 forward, as they may be representing a different birth-year cohort based on their birthdate.
RAND-Enhanced Fat File
Why don't the frequency counts on the fat files match those in the HRS codebooks for household level variables?
In the RAND-enhance Fat Files, we reorganized the data so that each observation represents one individual, and we merged the appropriate information from the various modules to each observation. This means that household level information is present for each individual respondent in couple households. So for household level variables, the counts on the RAND-enhanced Fat Files will be higher than those listed in the HRS codebooks.
I have successfully downloaded the 1998 RAND HRS fat file data and tried to look at the data in SPSS. However, I am unable to analyze the data - the prompt I receive is that the system is not responding and it seems like the computer is locked.
It sounds like you may be double clicking on the file. We have found that will bring the entire 266MB into memory, and like you said, will halt the system.
However, you can use the batch processing (production facility) to run a syntax file (.sps). Alternatively, in the interactive mode you can run code typed in the syntax window or submit a syntax file (.sps). Additionally, you may want to keep only the variables you are analyzing in order to minimize the memory allocation and allow procedures to execute. Please see the SPSS sample program.
Send questions or comments about this webpage to RANDHRSHelp@rand.org
Last modified June 2014