RAND HRS Longitudinal File:
- I got an error message when I tried to open the RAND HRS Longitudinal File in SAS. System says: "ERROR: File cannot be opened or format not found".
- How do I merge RAND HRS data products with other HRS data products?
- How do I read a Stata-SE file using Stata Intercooled?
- How are spouses included in the RAND HRS Longitudinal File?
- How can I tell if a respondent has died? And where can I find the date of death?
- Which interview dates and age variables should be used for each wave?
- How do I use weights in the RAND HRS Longitudinal File?
- I found some households whose members (both respondent and spouse) were less than 50 years old. How did they get into the sample?
- How do I construct a new data set that is at the household level, i.e, each household has one observation/record?
- I have found some cases with inconsistency between R1AGEY_E and S1AGEY_E within the same household. What is the reason for this?
- How do I get the sub-sample for HRS only? Is there a specific code that identified the people born between 1931-1941?
- I am teaching a course where students will be expected to do a research project using the RAND HRS Longitudinal File. Since I am registered with HRS, can I simply provide copies of the data to each of my students?
- Why do Employment (RwLBRF) and Retirement (RwRETEMP or RwSAYRET) variables differ?
- Are variables in RAND HRS data products adjusted for inflation?
- How can I reshape the file from wide to long in STATA?
- Where can I find information about human subjects research and protection for RAND HRS data products?
- Why are the imputed cognition variables missing for the most recent wave of data?
- I’m looking for the updated version of RAND HRS Version P but do not see any RAND files with letter versions anymore. Are these still available?
RAND HRS Fat Files:
- Why don't the frequency counts on the fat files match those in the HRS codebooks for household level variables?
- I have successfully downloaded the 1998 RAND HRS Fat File and tried to look at the data in SPSS. However, I am unable to analyze the data — the prompt I receive is that the system is not responding and it seems like the computer is locked.
RAND HRS Longitudinal File:
I got an error message when I tried to open RAND HRS Longitudinal File in SAS. System says: "ERROR: File cannot be opened or format not found".
Use sasfmts.sas7bdat to create a formats catalog (formats.sas7bcat). Assign library as the folder where the RAND HRS Longitudinal File and the sasfmts.sas7bdat file are located, and submit SAS code. LIBNAME library "[name of RAND HRS data folder]";
PROC FORMAT LIBRARY=library CNTLIN=library.sasfmts;
If your library assignment is not “library”, then SAS will require that you search for the format catalogs in a different order. By default, WORK.FORMATS and LIBRARY.FORMATS catalogs are always searched first. You will need to change this default option. For example, suppose you assign the library where the formats file is located as “USERLIB”. To properly use the formats file, use the FMTSEARCH= system option in SAS immediately following the PROC FORMAT statement:
OPTIONS FMTSEARCH=(USERLIB WORK LIBRARY);
This will force SAS to correctly search in your defined “USERLIB” library first.
How do I merge RAND HRS data products with other HRS data products?
To merge any RAND product, you can use HHIDPN but to merge to a raw HRS data product you have a couple of options.
You can merge on HHID and PN or you can create RAHHIDPN (a character version of HHIDPN). If you want to create RAHHIDPN on the raw HRS data, here is the code for SAS, STATA and SPSS.
RAHHIDPN = HHID || PN;
genstr9 rahhidpn=hhid+ pn;
String RAHHIDPN (A9).
–compute RAHHIDPN = concat(HHID,PN).
How do I read a Stata-SE file using Stata Intercooled?
You can read Stata Special Edition(SE) files such as the RAND HRS Longitudinal File or the RAND HRS Fat Files with Stata Intercooled by selecting variables on the use command, so long as the total number of variables does not exceed 2047.
use rahhidpn r1iwstat r2iwstat using "randhrs1992_2014v3.dta"
would select the respondent ID and 2 variables from the RAND HRS Longitudinal File 2014 (V3).
How are spouses included in the RAND HRS Longitudinal File?
The HRS samples at the household level. In a couple household, one or both of the couple is/are age-eligible for the study, but in either case BOTH individuals in a couple are given an interview and treated as respondents. So, the number of respondents in the RAND HRS Longitudinal File is the same as in the Core HRS respondent-level files and includes all HRS age-eligibles AND any non-age- eligible spouses.
For example, if Tom and Judy are a couple and both agree to be interviewed in 1998, you will see two records, one for Tom and one for Judy in both the RAND HRS Longitudinal File and the 1998 HRS Core data. On the RAND HRS Longitudinal File, you will see whether Judy has ever smoked as R4SMOKEV on Judy's record, and as S4SMOKEV on Tom's record. Moreover, you will see whether Tom has ever smoked as R4SMOKEV on Tom's record and as S4SMOKEV on Judy's. We add the spouse variables from the spouse report as a convenience.
If you are only interested in individuals without regard to spouses, you can simply ignore the Sw___ variables, and just use the Rw___ ones.
How can I tell if a respondent has died? And where can I find the date of death?
RwIWSTAT indicates the response and mortality status of the respondent at each wave. Respondents are identified by code 1, non-respondents by codes 4-7 and 9. Non-respondents who died between the last interview and the current one are assigned a 5 in RwIWSTAT, while those who died before the previous interviewed are assigned a 6.
Non-response code 4 means that the respondent is alive so far as we know but did not respond. A code of 7 means that the respondent has asked to be dropped from the sample, but was alive the last time this was observed. A code of 9 means that we don't know if the individual is alive or not.
Mortality status is taken from the Tracker file. Known alive and presumed alive are both treated as indication that the individual is living.
If the last available wave is based on Early Release data, the Tracker file may not yet indicate whether an individual is alive or not. If the Tracker does not include a mortality flag for the early release wave, and Exit Interview data for the interview year are available, RwIWSTAT will flag those with an Exit Interview as deceased.
RADYEAR and RADMONTH are year and month of death ascertained by HRS and are populated by either the Exit Interview or spouse reported information. RADDATE is the SAS date format of the death date.
Which interview dates and age variables should be used for each wave?
The public use HRS data provide two interview dates for the early waves, a beginning interview date and an end interview date. In most cases the two dates are the same. On the RAND HRS Longitudinal File there are three versions of interview dates and age variables.
The RwIWBEG interview data and RwAGEY_B respondent age variables reflect the beginning interview dates, and the RwIWEND and RwAGEY_E variables are based on the end interview date. The RwIWMID and RwAGEY_M variables are derived as the midpoint between the beginning and ending interview dates.
For most purposes it is best to use the variables based on the end interview date, that is, RwIWEND and RwAGEY_E. For most interviews that have two dates, the interview was postponed just after starting. So most of the interview was administered at the end date.
How do I use weights in the RAND HRS Longitudinal File?
The weights included in the RAND HRS Longitudinal File are described in the RAND HRS Longitudinal File Documentation under the following sections:
- Sampling Weight
- Household Analysis Weight
- Person-Level Analysis Weight
The weights you use, of course, are going to be driven by the types of analyses you are doing. Though statistical advice is beyond the scope of the help we can provide, we can verify that the sampling weights are only available for HRS (1992) on the RAND HRS Longitudinal File, and that the respondent and HH weights (RwWTRESP and RwWTHH) are taken directly from the HRS-provided weights on the Tracker file.
The HRS weight document (PDF) explains how the respondent and HH level weights are created and may help you decide which weights are appropriate to your analysis. More information on HRS weights can be found on the HRS website, under Documentation, Weights.
There are also some resources available through the HRS website that describe how one would begin performing analyses in various statistical packages. An HRS User Guide, Getting Started with the Health and Retirement Study (PDF), Chapter 8 shows an example of using weights in Stata.
There are just a couple of additional issues to be aware of:
- If you plan to use the data longitudinally you should also be aware that before 1998, non-age-eligible spouses in the HRS and AHEAD cohorts are given zero weights but in 1998 and beyond these spouses are given weights if they were born in years corresponding to other sample cohorts.
- Respondents living in nursing homes are given zero weight in the individual (RwWTRESP) and household (RwWTHH) weights provided in the RAND HRS Longitudinal File. There are also weights recently developed by HRS that provide non-zero weights to institutionalized individuals, that is, those living in nursing homes. These weights (RwWTR_NH) are provided for Wave 5 forward, and are set to zero for those not living in a nursing home.
We do not do any additional processing of the weight variables; we simply copy them onto the RAND HRS files for your convenience. The data use guides published by the HRS may also be useful.
I found some households whose members (both respondent and spouse) were less than 50 years old. How did they get into the sample?
An Age-Eligible Household is a household with at least ONE age-eligible respondent. Each HRS household is supposed to have at least one age-eligible respondent, but there are a few exceptions.
Generally we know that we can have some members in the sample because they were at some earlier point married to age-eligible persons. Similarly, others are in the sample because they are married to people who were previously married to people who were age-eligible. Note: Age-ineligible individuals should have zero respondent weights.
How do I construct a new data set that is at the household level, that is, each household has one observation/record?
If the proposed household structure is for one wave, it can be constructed with minimum effort, because it's a snapshot of one wave. To do this, select one case from each HwHHID, e.g. first or last case.
However, creating the household structure for a longitudinal file is more complicated. Events such as divorce and death affect the household composition. Households may split, marry, remarry, reunite or become widowed. You will have to consider how to handle all of these transitions.
Even if a couple household remains the same across interviews, the person responding may change. For example, suppose Jack and Jill are married. Jack responds in wave one, Jill responds in wave two, Jill dies, Jack remarries Sue and Sue responds in wave three. Any longitudinal treatment of households should take all these situations into consideration. If any of your measures are respondent-level rather than household level, you will want to be careful to assign information to the right individual.
Here are some variables that can help. HwHHID identifies the wave specific household. It stays the same from one wave to the next if the household remains the same, if one of a couple dies, and if a single person marries. HwCPL indicates if the household is a couple household. The in-wave variables (INWw) indicate whether an individual responded in a given wave. SwIWSTAT gives the response and mortality status of the current wave's spouse at each wave.
I have found some cases with inconsistency between R1AGEY_E and S1AGEY_E within the same household. What is the reason for this?
The spouse age is derived using the date of the respondent's interview. Sometimes the respondent and spouse have different interview dates, as in the cases you found, and the spouse's birthday is in between.
How do I get the sub-sample for HRS only? Is there a specific code that identified the people born between 1931-1941?
You can use the "cohort" variable to subset the HRS cases. The values of the HACOHORT variable are as follows:
Note that the HRS/Ahead overlap cases are all considered Ahead although one in each overlap couple is HRS eligible. See RAOVRLAP in codebook for more information.
You may also want to select on birth year to limit your sample to age-eligible HRS entry cohort members. (Also see the RAHRSAMP variable.) If you don't select on birth year, and use weights, you should be aware that age-ineligible spouses have positive weights from 1998 forward, as they may be representing a different birth-year cohort based on their birthdate.
I am teaching a course where students will be expected to do a research project using the RAND HRS Longitudinal File. Since I am registered with HRS, can I simply provide copies of the data to each of my students?
No. Each of your students must register with HRS, and everyone (including yourself) is expected to honor the conditions of use. For more information, please see HRS’s distribution and replication policy.
Why do Employment (RwLBRF) and Retirement (RwRETEMP or RwSAYRET) variables differ?
For the most part, the labor force values for retirement, RwLBRF=4 and RwLBRF=5, line up with the retirement indicators RwRETEMP and RwSAYRET. But there are some cases in which RwRETEMP or RwSAYRET indicates retirement and RwLBRF does not. In these cases, the HRS data contain evidence from the employment survey items that Respondents are still in the labor force, whether working or unemployed looking for work. You might think of RwLBRF as capturing the totality of Respondents’ employment and retirement data for a particular wave and RwSAYRET as simply reflecting whether Respondents consider themselves retired.
Since retirement dates are derived from different source variables, as specified in the RAND HRS Longitudinal File codebook, and Respondents can report retirement dates several times across waves, there can be notable longitudinal variation RwRETYR and RwRETMON. We provide available dates at each wave because we do not know the most accurate retirement date based on the data given.
Are variables in RAND HRS data products adjusted for inflation?
No, all variables in RAND HRS data products are reported in nominal dollars. This includes wage rate in the Employment section of the RAND HRS Longitudinal File and all Income and Wealth variables appearing in the RAND HRS Longitudinal File and the RAND HRS Detailed Imputations File.
How can I reshape the file from wide to long in STATA?
The easiest way to handle this issue is to first limit the data set to the variables that you’re most interested in. Then use the following syntax:
reshape long inw r@varx s@vary h@varz […any other variable of interest that contains a wave number…], i(hhidpn) j(wave)
Where can I find information about human subjects research and protection for RAND HRS data products?
All variables in RAND HRS data products are derived from public release data collected and published by our HRS colleagues at the University of Michigan. Details of human subjects protection protocols are managed by the University of Michigan’s Institutional Review Board, and all relevant documentation is available at https://hrs.isr.umich.edu/publications/biblio/9048.
Why are the imputed cognition variables missing for the most recent wave of data?
The cognition variables are taken directly from the HRS imputations of cognitive functioning. These imputations are generally published by HRS on their website between our publication of versions of the RAND HRS Longitudinal File containing early release data and final release data for the most recent year (wave) of data.
If you notice that HRS has published imputed cognition data that includes the most recent wave of data (https://hrs.isr.umich.edu/data-products/cognition-data) and wish to use the data, you may merge it onto the RAND HRS Longitudinal File using the HHID and PN variables.
I’m looking for the updated version of RAND HRS Version P but do not see any RAND files with letter versions anymore. Are these still available?
We have changed the file naming convention for the RAND HRS Longitudinal File. Previously, we used letters to denote subsequent versions (i.e., RAND HRS Version P). Now, we use the year of the latest HRS wave that is included in the longitudinal file and a version number. If you need a previous version of the data to recreate the analyses for a paper, or provide to a journal for publication, you can find archived versions of our data at https://hrs.isr.umich.edu/data-products/access-to-public-data. Simply log in with your UserID and Password, and click on the “Data Downloads” link. There you will find the “RAND HRS Archived Data Products” in the "RAND Contributed Files" section on the right-hand side of the page.
RAND HRS Fat Files:
Why don't the frequency counts on the fat files match those in the HRS codebooks for household level variables?
In the RAND HRS Fat Files, we reorganized the data so that each observation represents one individual, and we merged the appropriate information from the various modules to each observation. This means that household level information is present for each individual respondent in couple households. So for household level variables, the counts on the RAND HRS Fat Files will be higher than those listed in the HRS codebooks.
I have successfully downloaded the 1998 RAND HRS Fat File and tried to look at the data in SPSS. However, I am unable to analyze the data - the prompt I receive is that the system is not responding and it seems like the computer is locked.
It sounds like you may be double clicking on the file. We have found that will bring the entire 266MB into memory, and like you said, will halt the system.
However, you can use the batch processing (production facility) to run a syntax file (.sps). Alternatively, in the interactive mode you can run code typed in the syntax window or submit a syntax file (.sps). Additionally, you may want to keep only the variables you are analyzing in order to minimize the memory allocation and allow procedures to execute. Please see the SPSS sample program.