Machine Learning Versus Standard Techniques for Updating Searches for Systematic Reviews

A Diagnostic Accuracy Study

Published in: Annals of Internal Medicine, Volume 167, Number 3 (August 2017), pages 213-215. doi: 10.7326/L17-0124

Posted on RAND.org on August 01, 2017

by Paul G. Shekelle, Kanaka Shetty, Sydne Newberry, Margaret A. Maglione, Aneesa Motala

Read More

Access further information on this document at Annals of Internal Medicine

This article was published outside of RAND. The full text of the article can be found at the link above.

Background

Systematic reviews are a cornerstone of evidence-based care and a necessary foundation for care recommendations to be labeled clinical practice guidelines. However, they become outdated relatively quickly and require substantial resources to maintain relevance. One particularly time-consuming task is updating the search to identify relevant articles published since the last search. We previously tested machine-learning approaches for making screening for updating more efficient by using 2 clinical topics as examples: medications to treat low bone density and off-label use of atypical antipsychotics. We tested 2 machine-learning algorithms: a generalized linear model with convex penalties (glmnet, R Foundation for Statistical Computing) and gradient boosting machines. Although initial results were encouraging, these methods required fully indexed PubMed citations.

Objective

To report the preliminary results of our efforts to compare standard electronic search methods for updating with machine-learning methods that identify new evidence using only the title and abstract.

Methods

We used citations from an original review to generate machine-learning estimators that assigned each citation from an updated search a probability for being relevant to 1 or more research questions. These estimators were constructed using a bag-of-words approach in which citations from the original search were first processed into a set of word frequencies that were used to model likelihood of relevance. The final estimators relied on the support vector machine algorithm. To evaluate the machine-learning method, we measured its sensitivity, positive predictive value, and overall accuracy for identifying articles that would have been included and excluded by the standard approach (in which the original search was replicated from the original review's search end date to the present). Search results for the first topic, treatment of low bone density, were compared prospectively and in a blinded fashion; that is, experienced reviewers independently and concurrently reviewed citations retrieved from the entire standard search and from machine learning. Search results for the 2 other topics, treatment of gout and of osteoarthritis of the knee, were compared retrospectively in that the standard method was used for the update and the machine-learning method was used shortly thereafter to identify included articles. In all cases, we developed models and selected the final algorithm (support vector machine) using only the original search results. We then applied the final models to citations from the updated searches and calculated sensitivity, positive predictive value, and overall accuracy.

Results

For all 3 topics, the number of titles requiring human screening decreased dramatically by 67% to 83%. For 2 topics, the machine-learning approach missed 1 title included as evidence in the update report (sensitivity of 97% and 91%, respectively). For the third topic, the machine-learning approach identified all titles included in the update report.

This report is part of the RAND Corporation external publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.