Machine Learning Methods in Systematic Reviews

Identifying Quality Improvement Intervention Evaluations

Published in: Machine Learning Methods in Systematic Reviews: Identifying Quality Improvement Intervention Evaluations / Susanne Hempel et al. Methods Research Report (Prepared by the Southern California Evidence-based Practice Center under Contract No. 290-2007-10062-I). AHRQ Publication No. 12-EHC125-EF. (Rockville, MD : Agency for Healthcare Research and Quality, Sep. 2012), 55 p

Posted on on September 01, 2012

by Susanne Hempel, Kanaka Shetty, Paul G. Shekelle, Lisa V. Rubenstein, Marjorie Danz, Breanne Johnsen, Siddhartha Dalal

Read More

Access further information on this document at

This article was published outside of RAND. The full text of the article can be found at the link above.

BACKGROUND: Electronic searches typically yield far more citations than are relevant, and reviewers spend a substantial amount of time screening titles and abstracts to identify potential studies eligible for inclusion in a review. This is of particular relevance in complex research fields such as quality improvement. We tested a semiautomated literature screening process applied to the title and abstract screening stage of systematic reviews. A machine learning approach may allow literature reviewers to screen only a fraction of a search output and to use a predictive model to learn and then emulate the reviewers' decisions. Once learned, the model can apply the selection process to an essentially unlimited number of citations. METHOD: Two independent literature reviewers screened 1,591 quasi-randomly selected citations in a training dataset used to predict decisions on the remaining citations in a MEDLINE search output of 9,395 citations. We explored different prediction algorithms and tested results against reference samples screened by experts in quality improvement. Qualitative (relevance cutoff determined in ROC curve) and quantitative predictions (probability rank order of citations) were determined. RESULTS: The agreement between independent literature reviewers ranged from κ= 0.55 to 0.57. Across two reference samples, the predictive performance of the machine learning approach demonstrated 90.1 percent sensitivity, 43.9 percent specificity, and 32.1 percent PPV. This translates to a reduction of 36.1 percent in citation screening if applied. The predictive performance was affected by reviewer disagreements: a subgroup analysis restricted to citations both reviewers agreed on showed a sensitivity of 98.8 percent (specificity 43.9 percent). CONCLUSION: Machine learning approaches may assist in the title and abstract inclusion screening process in systematic reviews of complex, steadily expanding research fields such as quality improvement. Increased reviewer agreement appeared to be associated with improved predictive performance.

This report is part of the RAND Corporation External publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

Our mission to help improve policy and decisionmaking through research and analysis is enabled through our core values of quality and objectivity and our unwavering commitment to the highest level of integrity and ethical behavior. To help ensure our research and analysis are rigorous, objective, and nonpartisan, we subject our research publications to a robust and exacting quality-assurance process; avoid both the appearance and reality of financial and other conflicts of interest through staff training, project screening, and a policy of mandatory disclosure; and pursue transparency in our research engagements through our commitment to the open publication of our research findings and recommendations, disclosure of the source of funding of published research, and policies to ensure intellectual independence. For more information, visit

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.