Neither Teacher Standards Nor Payment Programs Guarantee Higher Student Achievement
Teacher effectiveness continues to be a hot topic among U.S. lawmakers. RAND is currently working on several projects that could offer guidance as to how to measure teacher effectiveness and how to use those measures to improve teacher performance.
Two early pieces of RAND work shed light on the relationship between student performance on the one hand and two alternative variables on the other: traditional teacher qualifications and an innovative pay-for-performance program. Although this research should not be accepted as the last word on these issues, it does offer a good starting point for discussing them.
In the first study, RAND found no evidence that traditional teacher qualifications — including experience, education, and scores on licensure examinations — bear any relationship to growth in student achievement in Los Angeles public elementary schools. In response to such findings, some observers have argued for a greater reliance on alternative measures of effectiveness, such as “value-added” models, and for using such measures in pay-for-performance programs that are designed to motivate classroom teachers.
It is just as important to hold the accountability systems themselves accountable for measuring genuine progress.
Pay-for-performance programs are premised on the notion that rewarding teachers financially for student achievement gains can spur the teachers to be more effective in the classroom, thereby improving student performance. But RAND and its collaborators at the National Center on Performance Incentives found no evidence of such an outcome in one pay-for-performance program in Nashville public schools. In Nashville, the test scores rose quite evenly whether or not the teachers had been paid on the basis of their students’ performance.
Although it evaluates just one pay-for-performance design, the Nashville study underscores the importance of putting all accountability and incentive systems to the test. The preliminary evidence from the two studies outlined below by Richard Buddin and by Daniel McCaffrey points to a crucial early lesson: It is not enough to focus on teacher qualifications or to offer teachers incentives based on student results; it is just as important to hold the accountability systems themselves accountable for measuring genuine progress.
Teacher Qualification Standards
We examined the relationship between teacher qualification standards and student achievement by cross-referencing teacher data with five years of student test scores in math and reading in the Los Angeles Unified School District. Our analysis is based on a sample of more than 300,000 students in grades 2 through 5 who were taught by more than 16,000 different teachers.
AP IMAGES/GERALD HERBERT
By linking individual students to their classroom teachers, the data allowed us to examine student progress from year to year and across classrooms led by different teachers. The teacher-specific information included years of experience, academic degrees obtained, and licensure test scores. These matched student/teacher data are unusual in student achievement analysis.
Our results suggest that the teacher is indeed an important determinant of student achievement; however, there is no direct connection between the traditionally assumed measures of teacher effectiveness and student achievement. There is also little evidence to suggest that the teachers who can increase student achievement are concentrated in a few high-performing schools. In fact, the teachers who are effective at raising achievement are evenly distributed throughout the district. This suggests that simply reshuffling teachers from one school to another is unlikely to produce substantial improvement in student achievement in low-performing schools.
Teacher pay is typically based on experience and education level, because these characteristics are commonly assumed to correlate with teacher effectiveness. But we found that a five-year increase in teaching experience improved student achievement very little — less than 1 percentage point. Similarly, the level of education held by a teacher had no effect on student achievement.
Licensure scores had no effect, either. Considerable resources are expended on teacher licensure exams, which restrict entry into the teaching profession. California requires new elementary teachers to pass general aptitude, subject matter, and reading instruction competency tests. But when we compared licensure test results with teacher performance in terms of student test scores, we found no relationship. We also analyzed whether failing the licensure exam before later passing it was related to student achievement and found no statistically significant link.
Teachers with less experience and education and lower licensure scores are concentrated in schools with poor average test scores. However, these differences among teachers contribute little to the differences in student achievement growth. Teachers are making comparable improvements across a broad range of schools, and the performance difference among the schools is mostly the result of student background and preparation. Socioeconomic status is a strong predictor of student success.
A limitation of our research is that licensure test scores and teacher performance data are available only for teachers who pass the licensure tests, which are designed to set minimum teaching proficiency standards. Aspiring teachers who fail the tests might indeed have worse classroom outcomes than those who pass the tests and are allowed to teach.
While it is true that some teachers are much more effective than others, our findings suggest that the traditional measures of teacher quality do not predict classroom performance. The California evidence suggests that education experts should rethink the knowledge requirements of new teachers and develop measures that more accurately predict performance. Future research should focus on identifying specific teacher attributes or practices that enhance student achievement.
The traditional compensation systems might offer too much incentive for further education that does not appear to contribute to student performance and too little incentive for the “best” teachers to deliver their best performance on a consistent basis. Even if the more-experienced, better-educated, or more-skilled teachers (as measured by licensure exams) are better able to teach, they might not be persistently motivated to apply those abilities to the greatest extent.
This research on Los Angeles schools was part of a larger research project called “Teacher Licensure Tests and Student Achievement,” which was sponsored by the Institute of Education Sciences in the U.S. Department of Education under grant number R305M040186.
It might be tempting to reward teachers for their performance rather than for their qualifications. But merely paying teachers for their performance will not necessarily lead to the desired outcomes.
The Project on Incentives in Teaching (POINT) was a three-year experiment in the Metropolitan Nashville School System in which about 70 percent of the district’s middle school mathematics teachers (in grades 5, 6, 7, and 8) voluntarily participated in a controlled experiment to assess the effect of financial rewards for teachers whose students showed unusually large gains on standardized tests. Conducted between 2006 and 2009, the experiment tested the notion that rewarding teachers for improved scores would cause their students’ scores to rise. It was up to the teachers to decide what, if anything, they needed to do to raise student performance. Thus, POINT was focused on the notion that a significant problem in American education is the absence of appropriate incentives and that correcting the incentive structure would, in and of itself, constitute an effective intervention and improve student outcomes.
AP IMAGES/MICHAEL DWYER
By and large, the results did not confirm this hypothesis. While the general trend in middle school mathematics performance was upward over the period of the project, the students of teachers randomly assigned to the treatment group (eligible for bonuses) did not outperform the students whose teachers had been assigned to the control group (ineligible for bonuses).
Sometimes, Incentives for Public-Sector Performance Work
In POINT, the maximum bonus an eligible teacher might earn was $15,000 — a considerable increase over base pay in this system. To receive this bonus, a teacher’s students had to perform at a level that historically had been reached by only the top 5 percent of middle school math teachers in a given year. Lesser amounts of $5,000 and $10,000 were awarded for performance at lower thresholds, corresponding to the 80th and 90th percentiles of the same historical distribution. Teachers were therefore striving to reach a fixed target rather than competing against one another. In principle, all participating teachers could have attained these thresholds.
It is unlikely that the bonus amounts were too small to motivate teachers assigned to the treatment group. It is also unlikely that the performance bar was set too high. About half the teachers could have reached the lowest of the bonus thresholds if their students had answered 2 to 3 more questions correctly on an exam of some 55 items.
In fact, POINT paid out more than $1.27 million in bonuses over the course of the experiment. In all, 51 of the initial treatment group of 152 teachers — more than a third — received a bonus. There were more bonus winners than expected on the basis of the district’s historical performance, but this was because performance overall was rising, not because teachers in the treatment group were doing better than teachers in the control group.
The most positive effect of the incentives on test scores was detected among fifth graders during the latter two years of the experiment. But this finding is of limited policy significance, because the effect did not appear to persist beyond fifth grade. (Students whose fifth-grade teacher had been in the treatment group performed no better by the end of sixth grade than did sixth graders whose teacher the year before had been in the control group.)
The participating teachers generally favored extra pay for better teachers, in principle. But they did not endorse the notion that bonus recipients in POINT were better teachers or that failing to earn a bonus meant a teacher needed to improve. The experiment did not set off significant negative reactions of the kind that have attended the introduction of merit pay elsewhere. But neither did it yield consistent and lasting gains in test scores. Moreover, POINT appears to have had little effect on what teachers did in the classroom. It simply did not do much of anything.
While one might speculate that the middle school math teachers lacked the capacity to raise test scores, this is belied by the district-wide upward trend in scores over the period of the project. This trend is probably the result of some combination of an increasing familiarity with a test introduced in 2004 and an intense, high-profile effort to improve test scores to avoid sanctions from the federal No Child Left Behind Act of 2001.
POINT simply did not do much of anything.
It should be noted that POINT tested a particular model of incentive pay. Our findings do not mean that another approach would not be successful. It might be more productive, for example, to reward teachers in teams or to combine incentives with coaching or professional development. But our experience with POINT demonstrates the importance of quantifying and validating the rewards of any pay-for-performance experiment before expanding it beyond a pilot program.
From an implementation standpoint, POINT was a success. This is no trivial result, given the widespread perception that teachers are adamantly opposed to merit pay and will resist its implementation in any form. No doubt some of the ease with which POINT ran was due to the understanding that this was an experiment intended to provide evidence on whether such performance incentives would raise achievement. Even teachers skeptical of the merits of the policy saw the worth in conducting the experiment.
We believe there is an important lesson here: Teachers may be more likely to cooperate with a pay-for-performance plan if its purpose is to determine whether the policy is a sound idea than they are with a plan that is being forced on them in the absence of such evidence and in the face of their skepticism and misgivings. In the meantime, we continue to do work to understand the effects of new teacher evaluation and compensation reforms, including those that offer alternatives to the POINT model.
Grant support for the research on the POINT program came from Vanderbilt University and the U.S. Department of Education under grant number R305A060034 entitled “National Center on Performance Incentives.”