The Positive Effects of Population-Based Preferential Sampling in Environmental Epidemiology

Published in: Biostatistics, 2016

Posted on RAND.org on July 06, 2016

by Matthew Cefalu, Luke Bornn

Read More

Access further information on this document at Biostatistics

This article was published outside of RAND. The full text of the article can be found at the link above.

In environmental epidemiology, exposures are not always available at subject locations and must be predicted using monitoring data. The monitor locations are often outside the control of researchers, and previous studies have shown that "preferential sampling" of monitoring locations can adversely affect exposure prediction and subsequent health effect estimation. We adopt a slightly different definition of preferential sampling than is typically seen in the literature, which we call population-based preferential sampling. Population-based preferential sampling occurs when the location of the monitors is dependent on the subject locations. We show the impact that population-based preferential sampling has on exposure prediction and health effect estimation using analytic results and a simulation study. A simple, one-parameter model is proposed to measure the degree to which monitors are preferentially sampled with respect to population density. We then discuss these concepts in the context of PM2.5 and the EPA Air Quality System monitoring sites, which are generally placed in areas of higher population density to capture the population's exposure.

This report is part of the RAND Corporation external publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.