Man loooking at chalked stars on a wall, photo by anyaberkut/Getty Images

commentary

(The RAND Blog)

March 21, 2019

Assessing Confidence in “What Works” in Social Policy

Photo by anyaberkut/Getty Images

by Yulia Shenderovich, Alex Sutherland, Sean Grant

Policy decisions are increasingly informed (or expected to be informed) by research evidence, such as impact evaluations. In the UK, the What Works Network (PDF) was launched in 2013 “to ensure that spending and practice in public services is informed by the best available evidence.” One of the tasks of the organizations in this network is to be “knowledge brokers.” They synthesize and appraise research evidence on the effects of policies, practices and programs—and communicate to practitioners and policymakers their views on the value of these programs. In a project for the College of Policing and the Education Endowment Foundation, two What Works Centres, RAND researchers reviewed approaches for assessing and labelling evidence.

We found that the choice of assessment approach is not trivial as there are many ways to rank research evidence (indeed, the issue of multiple standards is not unique to assessment of policy evidence). Different tools can actually provide opposite conclusions about the same program or research (for examples, see Voss & Rehfuess 2013; Means et al 2015; Losilla et al. 2018, Puttick 2018). Important information—such as how participants were allocated to a program or precisely what the program involved—is often missing from published papers. Also, much of the work that methodologists have done to develop evidence rating approaches has been in clinical medicine and is not always easily applied to behavioral and social programs, which tend to be complex.

The research team reviewed a number of tools and approaches to appraising evidence, focusing primarily on tools for experimental and quasi-experimental program evaluations. To complete the project within several months, we concentrated on the most commonly used and well-established approaches. As a result, for making judgements about individual studies we recommended the revised tool for Risk of Bias in randomized trials (RoB 2.0) and Risk Of Bias in Non-randomized Studies—of Interventions (ROBINS-I tool), because of the tools' extensive development processes and guidance available for reviewers.

Next, we considered approaches for tying together different types of evidence. For example, if we had 20 randomized controlled trials that tell us, on average, it is effective for police officers to use body-worn cameras to reduce complaints, use of force and assaults against officers during arrest, that might lead us to recommend that police should always use body cameras on duty. But what if—based on extensive qualitative research with police officers—police unions were advocating that officers should have discretion over whether cameras are turned on? How best to bring together individual studies with recommendations that are systematic and transparent, and to include as much of the available evidence as possible?

Making such judgements is not always easy and there are many ways to summarize and appraise bodies of evidence in health and social policy (PDF), but in this project we recommended the use of Grading of Recommendations Assessment, Development and Evaluation (GRADE). GRADE was developed by an international panel and is refined in an ongoing process, with more than 100 organizations around the world either using the full GRADE process or some of its principles.

The GRADE process begins by asking an explicit question about a program or policy, including pre-specifying the intervention, population, and outcomes of interest. Next, reviewers can classify their confidence in the evidence as high, moderate, low, and very low. The GRADE Evidence-to-Decision framework then helps to systematically factor in other considerations, such as acceptability and cost of the intervention. This framework allows researchers and decision-makers to transparently report on the reasoning behind a recommendation—for instance, even with limited confidence in the evidence on policy effects, the policy may be recommended based on the high priority of the problem and low cost of the approach.

When trying to assess the overall quality of evidence and how it informs judgements, it is important to avoid one-size-fits-all approaches, because evidence is rarely ‘one size'. As such, we support the suggestion by methodology researchers to move away from numerical checklists (e.g. “study sample of at least 100 participants, with no more than 20 per cent lost to follow-up”) towards reviewers making informed judgements about each relevant domain in a specific study and publishing the rationales for their judgements. While such an approach may not always be feasible in a fast-paced policy environment, moving towards making explicit and transparent judgements about the evidence at least pushes those making recommendations to be more clear about the basis upon which a recommendation is made.

Finally, another emphasis in the field is on providing reviewers with detailed guidance and training in the assessment tools to ensure they are used consistently and thoughtfully. Experts also recommend focusing on “risk of bias” instead of thinking about “study quality” because high-quality studies can still have a serious risk of bias.

Regardless of the chosen approach to assessing research, we believe making the process as systematic, transparent and explicit as possible provides users with ways to understand, question and contribute to the eventual policy recommendation, and gives policymakers and practitioners confidence in its credibility.


Yulia Shenderovich is an analyst and Alex Sutherland is a senior research leader at RAND Europe. Sean Grant is an adjunct behavioral and social scientist at the RAND Corporation.

Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.