Why don't incentives appear to be working in cases of teacher merit pay?
First, let's be clear that not all pay-for-performance (P4P) programs are the same. These programs differ greatly, from the choice of collective versus individual incentives, to the criteria by which incentives are awarded, to the inclusion of additional capacity-building elements, to the amount of the reward. Also, very few of these programs in the United States have been tested empirically. The research we've done at RAND and elsewhere in recent years has focused on programs incentivizing educator performance based primarily on the results from annual state tests of student performance. While limited, this research, along with theory, nevertheless suggests that several core factors may have contributed to the poor results found in recent P4P programs.
One factor is program design. Many of the programs studied, including New York City's Schoolwide Performance Bonus Program, have expected financial incentives alone to inspire improvement and have not included additional supports and resources potentially needed to bring about improvement. As others have argued in the past, motivation alone does not improve schools. Even if incentives inspire staff to improve practices or work together (in the case of collective incentives), educators may not have the capacity or resources (e.g., school leadership, social capital, knowledge, instructional materials, time) to bring about improvement.
The decision to link incentives to student test results exclusively or almost exclusively may be another design element contributing to the lack of observed results. Research and theory suggest that to achieve desired results, individuals and groups targeted by incentives must buy in to the program and its criteria. If, as we found in NYC, participants do not support the performance criteria (e.g., more than three-fourths of teachers surveyed in our NYC study felt bonus criteria relied too heavily on student test scores), the motivational power of the incentive could be greatly compromised.
A second factor is program implementation. For example, according to research, individuals and groups targeted by incentives must have a high degree of understanding of the program. Yet there is evidence that often too few do. In NYC, more than one-third of teachers did not understand the targets their school needed to reach to be eligible for the bonus, the potential bonus amount, or how decisions would be made regarding distribution within the school. Poor communication can severely limit the motivational effects of incentives. If individuals don't understand the criteria, how will they know where to direct their efforts? If they don't know the amount at stake, how can they gauge whether the payoff is worth the effort?
A third significant variable is the context within which P4P programs operate. Under current policies, all schools and educators face significant pressure to perform well on the same measures that are often incentivized by P4P programs. While educators are making changes in response to these broader accountability pressures, how much additional change can we realistically expect from added financial incentives? In NYC, we found that teachers in schools not assigned to the bonus program (control) were just as likely as those from assigned schools (treatment) to report undertaking a host of efforts to help their school achieve a high Progress Report grade, including efforts to improve student attendance, seek professional development opportunities to improve their practice, and work with students to set and monitor goals. In fact, teachers often reported that accountability pressures—to achieve their school's Adequate Yearly Progress target and to receive a high Progress Report grade—were more salient than financial bonuses.
Finally, individual perceptions may also affect the outcomes of P4P programs. Principals and teachers in NYC, for example, consistently reported viewing the bonus as recognition for work they were already doing (e.g., "a pat on the back") rather than a goal for which to strive. Also, intrinsic motivators—such as seeing themselves improve and seeing their students learn new skills and knowledge—ranked much higher than financial bonuses on the list of potential motivators cited by teachers on surveys. In this context, how much added motivational value should we expect from financial bonuses?
Chicago is now putting in place its own merit pay program, and it will be fascinating to see the results. Media coverage suggests the program design may anticipate some of the concerns mentioned above, for example, by using multiple performance measures beyond just test results and including training for principals. Yet the details of both program elements are still unknown. How much are test scores weighted? How are "quality management" and "school climate" measured? What kind of training will be provided? Program implementation, of course, cannot be judged right now. Lessons from past research suggest that communication will be important. And finally, there's context. How much added motivational value will be gained from the financial incentives compared to other accountability pressures and intrinsic motivators? That remains to be seen.
This op-ed was part of a Freakonomics Quorum: "The Debate over Teacher Merit Pay."
Julie Marsh is an adjunct researcher at the RAND Corporation, a non-profit research organization, and visiting associate professor at the Rossier School of Education at the University of Southern California.
This commentary originally appeared on Freakonomics on September 20, 2011. Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.