Contextual Data Library

What is it?

RAND's Contextual Data Library serves as a central data repository, funded by the Computing and Data Management Cores (Core B) of the RAND Population Research Center (NICHD grant P50 HD12639-16) and the RAND Center for the Study of Aging (NIA grants 5P01-AG08291 and P20-AG12815-01).

How are the data used?

Contextual data are used in analyses to characterize a time and/or place. For example, in analysis of whether married couples divorce, contextual data might include state-level data on local characteristics of divorce law. Similarly, an analysis of welfare use might include a variables measuring state AFDC and Medicaid benefits over time.

Where did the data come from?

Each of the data sets was prepared at RAND from publicly available data. The data sets will be updated periodically, and new data sets will be added as available.

*NOTE: Some of the variables used here were drawn from the RAND Contextual Data Library. Researchers are asked to please acknowledge the above funding sources in your work.

How can I obtain the data?

The datasets currently available are stored in various formats, depending upon how they were contributed to the repository. We have provided the option of either downloading each available format separately, or bundled together. There are UNIX and PC formats of the files. When possible, a group of files for a data set will include:

  • Readme file, which is a text file describing the data
  • SAS format definition
    • The SAS formats are defined so that they can be used to "look up" a value corresponding to the context (e.g., year and state) without sorting and merging. See Example
  • SAS data set
  • STATA data file
  • Tab-delimited file (for spreadsheet users)

Questions or comments? Send email to

Example of SAS format definition

Consider, for example, adding state population, by year and state, to your data using the SAS format definition downloaded from the Contextual Data Library. The format statements would look something like this:

proc format;
value popst
197001 ="3454557" /* 1970 ALABAMA (FIPS Code=01) */
197002 ="305328" /* 1970 ALASKA (FIPS Code=02) */
197004 ="1799531" /* 1970 ARIZONA (FIPS Code=04) */

...and so on for the 50 states and District of Columbia

197102 ="3497349" /* 1971 ALABAMA (FIPS Code=01) */
197104 ="316366" /* 1971 ALASKA (FIPS Code=02) */
197105 ="1896117" /* 1971 ARIZONA (FIPS Code=04) */

...and so on for as many years as we have.

In a SAS Data step, you can then "look up" the state population for whatever year is in the variable YEAR and whatever state FIPS code is in the variable STATE by using:

superkey=year*100 + state;
statepop=PUT (superkey, popst.);

So if YEAR=1971 and STATE=05 for Arizona, STATEPOP would be assigned 1,896,117.
Year and state are the variables that define the context, known as "key" variables, which are combined into one "superkey" for assigning the format.