Accounting for Misclassification in Electronic Health Records-derived Exposures Using Generalized Linear Finite Mixture Models
Published in: Health Services and Outcomes Research Methodology, Volume 17, Issue 2 (June 2017), pages 101 - 112. doi: 10.1007/s10742-016-0149-5
Posted on RAND.org on December 07, 2017
Read MoreAccess further information on this document at Health Services and Outcomes Research Methodology
This article was published outside of RAND. The full text of the article can be found at the link above.
Exposures derived from electronic health records (EHR) may be misclassified, leading to biased estimates of their association with outcomes of interest. An example of this problem arises in the context of cancer screening where test indication, the purpose for which a test was performed, is often unavailable. This poses a challenge to understanding the effectiveness of screening tests because estimates of screening test effectiveness are biased if some diagnostic tests are misclassified as screening. Prediction models have been developed for a variety of exposure variables that can be derived from EHR, but no previous research has investigated appropriate methods for obtaining unbiased association estimates using these predicted probabilities. The full likelihood incorporating information on both the predicted probability of exposure-class membership and the association between the exposure and outcome of interest can be expressed using a finite mixture model. When the regression model of interest is a generalized linear model (GLM), the expectation–maximization algorithm can be used to estimate the parameters using standard software for GLMs. Using simulation studies, we compared the bias and efficiency of this mixture model approach to alternative approaches including multiple imputation and dichotomization of the predicted probabilities to create a proxy for the missing predictor. The mixture model was the only approach that was unbiased across all scenarios investigated. Finally, we explored the performance of these alternatives in a study of colorectal cancer screening with colonoscopy. These findings have broad applicability in studies using EHR data where gold-standard exposures are unavailable and prediction models have been developed for estimating proxies.