How to Solve a Problem Like Missing Data


Jun 16, 2017

Financial data is displayed on a monitor

Illustration by foxaon/Fotolia

This commentary originally appeared on Statistics Views on June 16, 2017.

Missing data is a challenge for statisticians, policymakers, and analysts, particularly when a robust evidence base is needed. This is often caused by three key reasons: when data collection is done improperly, when mistakes exist in the data and when the data simply does not exist due to non-responses. The Second Longitudinal Study of Young People in England (LSYPE2), research designed to understand the compulsory education, school-to-work transition, careers, and lives of young people in the UK, suffers from the latter.

The overall aim of the study is to have a dataset that can provide a resource for evidence-based policy development. However, a significant barrier to this aim is the fact that, on top of the more 'run of the mill' missingness (the manner in which data is missing from a sample of a population) that bedevils longitudinal studies, LSYPE2 has systematic incomplete data owing to a boycott of Key Stage 2 (KS2) testing in 2010 that occurred before the study began. Boycotts of national tests leave gaps in pupils' attainment records and, in the case of LSYPE2, threaten to undermine a large-scale (and expensive) longitudinal study with substantial policy relevance. In LSYPE2, KS2 data was missing for approximately 30 per cent of the cohort…

The remainder of this commentary is available at

Alex Sutherland is a research leader at RAND Europe and Catherine Saunders is a statistician working in the Cambridge Centre for Health Services Research. Both were involved in Missing Data in the Second Longitudinal Study of Young People in England (LSYPE2), a report for the UK Department for Education.

More About This Commentary

Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.