Quality of Quality Measurement
Impact of Risk Adjustment, Hospital Volume, and Hospital Performance
Published in: Anesthesiology, Volume 125, pages 1092–1102 (December 2016). doi: 10.1097/ALN.0000000000001362
Posted on RAND.org on December 31, 2020
The validity of basing healthcare reimbursement policy on pay-for-performance is grounded in the accuracy of performance measurement.
Monte Carlo simulation was used to examine the accuracy of performance profiling as a function of statistical methodology, case volume, and the extent to which hospital or physician performance deviates from the average.
There is extensive variation in the true-positive rate and false discovery rate as a function of model specification, hospital quality, and hospital case volume. Hierarchical and nonhierarchical modeling are both highly accurate at very high case volumes for very low-quality hospitals. At equivalent case volumes and hospital effect sizes, the true-positive rate is higher for nonhierarchical modeling than for hierarchical modeling, but the false discovery rate is generally much lower for hierarchical modeling than for nonhierarchical modeling. At low hospital case volumes (200) that are typical for many procedures, and for hospitals with twice the rate of death or major complications for patients undergoing isolated coronary artery bypass graft surgery at the average hospital, hierarchical modeling missed 90.6% of low-quality hospitals, whereas nonhierarchical modeling missed 65.3%. However, at low case volumes, 38.9% of hospitals classified as low-quality outliers using nonhierarchical modeling were actually average quality, compared to 5.3% using hierarchical modeling.
Nonhierarchical modeling frequently misclassified average-quality hospitals as low quality. Hierarchical modeling commonly misclassified low-quality hospitals as average. Assuming that the consequences of misclassifying an average-quality hospital as low quality outweigh the consequences of misclassifying a low-quality hospital as average, hierarchical modeling may be the better choice for quality measurement.