Pitfalls of Panel Data: The Case of the SIPP Health Insurance Data
Survey data are inevitably imperfect. First, finite budgets imply that we sample only a (very small) fraction of the population-- inducing sampling error. Second, and more pernicious, are a range of non-sampling errors. In simple cross-sectional surveys, among the important non-sampling errors are non-response bias (some people in the original sample are not found or refuse to respond) and recall bias (some people forget that events occurred or mid-date them). This paper, prepared for the session on "Longitudinal Data Systems," considers corresponding problems in longitudinal (or panel) survey efforts. Corresponding to non-response bias, there is panel attrition--some people who answer the first interview cannot be located or refuse to respond to some (or all) later interviews. Corresponding to recall bias, there is seam bias--when asked when events occurred, some people respond that their status changed immediately following their previous interview. Using the health insurance data from the Survey of Income and Program Participation (SIPP) as an example, this paper explores these two non-sampling error issues for panel data. It describes under what conditions the biases are likely to be important, proposes simple methods for identifying if the biases are present, suggests work-arounds for imperfect data, and sketches formal parametric methods for estimation in the presence of these longitudinal non-sampling biases.