Scoring the Surgeon Scorecard


(The Health Care Blog)

A surgical team in an operating room

Photo by Tyler Olson/Fotolia

by Mark W. Friedberg

October 14, 2015

This commentary is a response to Ashish Jha's Piece on ProPublica published by The Health Care Blog.

I don't disagree at all with the idea that providers should release their own performance data, to the extent that they have it. Free flow of accurate and understandable performance information is inherently good. If the ProPublica Surgeon Scorecard can create pressure for this to happen, fantastic.

But there is no tradeoff between recognizing the serious methodological problems in the Scorecard, improving the Scorecard, and encouraging providers to release their own data. All three can and should be done simultaneously.

Also, for frequenters of this blog, I think it's important to clarify a few key things about the “RAND critique” (which I authored with individuals from many institutions, all of whom deserve credit for devoting considerable unpaid time to the effort).

  1. Nowhere in the critique do we suggest that ProPublica — or anybody else for that matter — abandon efforts to generate and publicize reports that truly reflect provider performance. Far from it. If you look up the authors of our critique, you'll see that all of us have devoted substantial time and effort to furthering the science and practice of performance measurement and transparency in health care.

  2. What the critique does suggest is to make methodological improvements and perform due diligence, based on current best practices in public reporting. Some of our suggested improvements are easy to make. Some are difficult and will require time and effort, but that isn't an excuse for not doing them. We also explain the reasons why these improvements are necessary: they address specific methodological steps that have not been performed in a scientifically credible manner (e.g., validating the brand-new measure that is being reported, checking the accuracy of the source data) and some suboptimal statistical choices (most notably, suppressing hospital random effect estimates in the reported “adjusted complication rates” — see our critique for the multiple reasons why this is problematic). We recommend calculating the reliability and risk of random misclassification (i.e., measurement error) in the report, and disclosing these to report users. This is a key component of transparency about the limitations of any report. To be clear though: all of the recommendations in the critique are doable, and they have been done for previous performance reports (which we cite in the critique). The only truly insurmountable barrier would be to find that the underlying Medicare claims data are so jumbled up that individual surgeons are wrongly assigned to surgeries at a very high rate. As disheartening as this would be, I think we can all agree that without reasonably accurate attribution of cases to providers, a public report cannot be useful to patients or providers, no matter how rigorous the statistical methods. And there is good reason to validate surgeon-surgery assignments carefully, given troubling prior findings about operating surgeon NPI inconsistency between Part A and Part B claims (see Dowd et al, which we cite in the critique).

  3. I for one do not doubt that ProPublica's goals in creating the Surgeon Scorecard are good, and I share them. As strange as I find ProPublica's promotional video and some other aspects of tone in ProPublica's response to our critique, I chalk this up to “Journalists are from Venus / Researchers are from Mars” — a difference in professional culture. So the problem is not the aim of the effort. The problem is the execution.

  4. This is the toughest thing to communicate clearly, and I recommend reading our critique for more detail: It is entirely possible for a performance report with poor or unknown validity and reliability (which together determine the degree to which reported data are true predictors of the care a patient will receive from a given provider) to cause harm to patients and providers, both in the short and long term. For the reasons detailed in our critique, we come to a pretty strong conclusion on this regarding the Scorecard: potential users should not consider it valid or reliable in its current form. Future versions can and should be better. To be clear though, our conclusion isn't about P values or confidence intervals (except insofar as the confidence intervals tell us something about measurement reliability). It's about the hard reality that the validity (i.e., truth) of a performance report is only as strong as its weakest methodological link. We highlight the weakest ones in the critique. We aren't asking readers to just take our word for it; we make logical arguments. And if anything isn't clear, my coauthors and I are happy to explain the more esoteric, but very important, methodological points.

If others read our critique and still want to use the Scorecard to help choose a surgeon, they should by all means do so, hopefully with our caveats in mind. But I would give them this advice: Ask your prospective surgeons for their rates of short and long-term mortality, morbidity (including the most common and severe complications), and operative success. Ask them how they know these rates. How do they track them, if at all? Ask these questions, and use any other information at your disposal, with equal vigor, for surgeons with the lowest and highest “adjusted complication rates” on the ProPublica Surgeon Scorecard.

Mark Friedberg is a senior natural scientist at the nonprofit, nonpartisan RAND Corporation.

This commentary originally appeared on The Health Care Blog on October 13, 2015. Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.