Cover: Predicting COVID-19 Outbreaks in Correctional Facilities Using Machine Learning

Predicting COVID-19 Outbreaks in Correctional Facilities Using Machine Learning

Published in: MDM Policy & Practice, Volume 9, No. 1 (January-June 2024). DOI: 10.1177/23814683231222469

Posted on Feb 7, 2024

by Giovanni Malloy, Lisa B. Puglisi, Kristofer B. Bucklen, Tyler D. Harvey, Emily Wang, Margaret L. Brandeau


The risk of infectious disease transmission, including COVID-19, is disproportionately high in correctional facilities due to close living conditions, relatively low levels of vaccination, and reduced access to testing and treatment. While much progress has been made on describing and mitigating COVID-19 and other infectious disease risk in jails and prisons, there are open questions about which data can best predict future outbreaks.


We used facility data and demographic and health data collected from 24 prison facilities in the Pennsylvania Department of Corrections from March 2020 to May 2021 to determine which sources of data best predict a coming COVID-19 outbreak in a prison facility. We used machine learning methods to cluster the prisons into groups based on similar facility-level characteristics, including size, rurality, and demographics of incarcerated people. We developed logistic regression classification models to predict for each cluster, before and after vaccine availability, whether there would be no cases, an outbreak defined as 2 or more cases, or a large outbreak, defined as 10 or more cases in the next 1, 2, and 3 d. We compared these predictions to data on outbreaks that occurred.


Facilities were divided into 8 clusters of sizes varying from 1 to 7 facilities per cluster. We trained 60 logistic regressions; 20 had test sets with between 35% and 65% of days with outbreaks detected. Of these, 8 logistic regressions correctly predicted the occurrence of an outbreak more than 55% of the time. The most common predictive feature was incident cases among the incarcerated population from 2 to 32 d prior. Other predictive features included the number of tests administered from 1 to 33 d prior, total population, test positivity rate, and county deaths, hospitalizations, and incident cases. Cumulative cases, vaccination rates, and race, ethnicity, or age statistics for incarcerated populations were generally not predictive.


County-level measures of COVID-19, facility population, and test positivity rate appear as potential promising predictors of COVID-19 outbreaks in correctional facilities, suggesting that correctional facilities should monitor community transmission in addition to facility transmission to inform future outbreak response decisions. These efforts should not be limited to COVID-19 but should include any large-scale infectious disease outbreak that may involve institution-community transmission.

Research conducted by

This report is part of the RAND external publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.