Jun 21, 2018
Photo by julief514/GettyImages
The Intensive Partnerships for Effective Teaching initiative, designed and funded by the Bill & Melinda Gates Foundation, was a multiyear effort aimed at increasing students’ access to effective teaching and, as a result, improving student outcomes. It focused particularly on high school graduation and college attendance among low-income minority (LIM) students.
The foundation asked a team of researchers from the RAND Corporation and the American Institutes for Research to evaluate whether the initiative improved teaching effectiveness and student outcomes.
The team found that, despite the sites’ efforts and considerable resources, the initiative failed to achieve its goals for improved student achievement and graduation, although the sites did implement improved measures of teaching effectiveness. With minor exceptions, student achievement, LIM students’ access to effective teaching, and graduation rates in the participating districts and charter management organizations (CMOs) were not dramatically better than at similar sites that did not participate in the initiative.
This brief, based on a longer final report, summarizes the findings of the team’s evaluation and offers some possible reasons the Intensive Partnership initiative did not achieve its goals for students.
The initiative involved three school districts and four CMOs.
The seven sites that participated in the Intensive Partnership initiative agreed to develop a robust measure of teaching effectiveness, including a structured way to observe and assess classroom teaching. They were then to use the information on effectiveness in conjunction with new or revised policies designed to do the following:
As determined by the site-developed measures, almost all teachers were considered to be effective.
Each participating site designed a teacher evaluation system that included at least two factors in a composite score: (1) rubric-based ratings based on classroom observations and (2) a measure of student achievement growth.
Photo by Steve Debenport/Getty Images
According to the measures the sites developed and the thresholds they set for teaching effectiveness, almost all teachers were deemed effective (see Figure 1). Over time, more and more teachers were rated in the top effectiveness categories, and fewer and fewer were rated as ineffective. By the end of the initiative, just 1 to 2 percent were classified as ineffective in most of the sites. This might reflect actual improvement in teaching effectiveness, but there is some evidence that it is due to other factors, such as increasingly generous ratings on subjective components (e.g., classroom observations).
The evaluation system raised some practical challenges that sites addressed in different ways. One was that the observations placed a burden on principals’ time, so some sites reduced the length or frequency of classroom observations or allowed other administrators to conduct them. Another was that many teachers did not receive individual scores for their contribution to student achievement because there were no standardized tests in their subjects or grade levels. Some sites handled this by assigning a school-level average score to those teachers; others adopted alternative ways to measure student growth.
Despite some concerns about fairness, surveys that the team administered found that the majority of teachers thought that the evaluation measures were a valid measure of their effectiveness as teachers, particularly the classroom-observation component. Furthermore, most teachers thought that the evaluation system had helped them improve their teaching.
The initiative had little effect on the retention of effective teachers, but it did increase the rate of departure of ineffective teachers.
The sites made efforts to retain effective teachers, including offering additional compensation and career opportunities based on effectiveness. However, in the end, effective teachers were no more likely to be retained after the initiative than before it.
On the other hand, ineffective teachers were more likely than before to depart from the sites. Across the sites for which data were available, about 1 percent of teachers were dismissed for poor performance in the 2015–2016 school year. Sites dismissed few teachers at least partly because their evaluation systems identified very few poor performers; however, the likelihood that those identified as poor performers would leave the site—whether voluntarily or involuntarily—increased during the initiative.
The three districts set specific criteria based on their new evaluation systems to identify low-performing teachers who might be denied tenure, placed on improvement plans, or considered for dismissal or nonrenewal of their contracts. The CMOs (which do not offer tenure) did not establish specific criteria to identify low performers but did take teacher evaluation results into account when considering improvement plans or contract renewal.
The sites also had to deal with the potentially conflicting goals of using measures of teaching effectiveness for dismissing low-performing teachers and using them to help teachers improve. In general, they tended to favor trying to help teachers improve rather than dismissing them.
All the sites modified their recruitment and hiring policies somewhat during the initiative—for example, by facilitating hiring in hard-to-staff schools or developing partnerships with local colleges. However, the researchers found little evidence that the new policies led to the hiring of more-effective teachers. Although school leaders generally thought that hiring processes worked fairly well, the sites still had difficulty attracting effective teachers to high-need schools, and persistent teacher turnover was a particular problem for the CMOs.
Evaluation-linked professional development (PD) and support were difficult to achieve.
All the sites offered multiple types of PD, including coaching, workshops, school-based teacher collaboration, and online and video resources. However, the sites struggled to figure out how to organize this training and support to address individual teachers’ identified needs.
One possibility is that scores and feedback from the measures of teaching effectiveness might not have been detailed enough to support specific suggestions for customized PD, and existing PD systems might not have been flexible enough to provide such customization. Also, there were few existing models of evaluation-linked PD that the sites could easily adopt, and sites lacked the capacity to develop and implement new models themselves.
Most school leaders said that they suggested PD and support based on teachers’ evaluation results, but the sites generally did not require teachers to participate, monitor their participation, or examine whether participants’ teaching effectiveness improved as a result. In addition, some also found it difficult to develop a coherent system of PD offerings.
Teachers in all the sites generally believed that the PD activities in which they participated were useful for improving student learning. Most teachers had access to some form of coaching, on which the sites often relied to individualize PD, and the percentage of teachers with access to coaching increased over time. Teachers with lower ratings were more likely than higher-rated teachers to report receiving individualized coaching or mentoring, but they were generally no more likely than higher-rated teachers to say that the support they received had helped them.
Some compensation and career-ladder policies were enacted to retain effective teachers, but they were not as extensive as envisioned, did not always follow best practices, and were not necessarily incentives about which teachers cared.
All seven participating sites implemented effectiveness-based compensation reforms, which varied in terms of timing, eligibility criteria, dollar amounts, and the proportion of teachers earning additional compensation. Teachers generally endorsed the idea of additional compensation for outstanding teaching, but (except in two of the CMOs) most reported that their sites’ compensation systems did not motivate them to improve their teaching. See Figure 2.
All seven sites also introduced specialized roles, with additional pay, open to effective teachers who accepted additional responsibility to provide instructional or curricular support to other teachers. However, none of the sites implemented career ladders, in which specialized roles come with sequential steps and growing responsibility, like the initiative sponsors had envisioned. The districts and CMOs took somewhat different approaches to creating specialized roles for teachers. The districts created a few positions that focused on coaching and mentoring new teachers in struggling schools, while the CMOs created more positions with a wider range of duties as needs shifted over time.
|School||Most teachers agreed that teachers should receive additional compensation for demonstrating outstanding skills||However, fewer teachers said that their site's compensation system motivated them to improve their teaching|
Photo by FatCamera/Getty Images
The initiative did not achieve its goals of increasing teaching effectiveness overall, improving access to effective teaching for LIM students, or boosting student outcomes.
The analysis found little evidence that teaching effectiveness improved as a result of the initiative. This was true whether teaching effectiveness was measured by the sites’ own composite measures or by an independently calculated measure. The researchers also looked for an increase in the teaching effectiveness of newly hired teachers but did not find evidence of one. As mentioned, the departure rate for the least effective teachers increased at some sites, although this success was not sufficient to noticeably improve the average teaching effectiveness of those sites.
At the beginning of the initiative, LIM students had roughly the same access to effective teaching as all students, and their access had not improved by the end of the initiative. In addition, their achievement and graduation rates appeared no different from those of their peers in similar schools that did not participate in the initiative (see table). Similarly, the analyses of test results and graduation rates for students overall showed no evidence of the initiative having a widespread positive impact in most sites and grade ranges. However, the initiative did have a significant positive effect in high school English in the CMOs and PPS but a significant negative effect in grade 3–8 mathematics in the CMOs.
Two caveats should be considered in interpreting these results. First, teacher-evaluation mandates with consequences were enacted in three of the four states at the same time as the initiative, so the comparison sites and the sites participating in the initiative were exposed to some of the same types of new policies. The team’s impact estimates reveal the extent to which the initiative improved student outcomes over and above these statewide efforts. Second, it is possible that the reforms simply require more time to take effect, so the research team is monitoring student outcomes for two additional years.
The initiative's goal of boosting student outcomes was not achieved
|Site||Grades 3–8||High School|
NOTE: The researchers could not estimate the impact on high school mathematics because students did not take the same secondary mathematics tests. N/A = not applicable.
Statistical significance measured at p < 0.05.
Photo by monkeybusinessimages/Getty Images
The initiative had greater success implementing measures of teaching effectiveness than improving student outcomes.
A favorite saying in the educational measurement community is that one does not fatten a hog by weighing it. In the end, the sites were better at implementing measures of teaching effectiveness than at using them to improve student outcomes. The RAND/American Institutes for Research evaluation of the Intensive Partnerships for Effective Teaching initiative does not explain why the desired student outcomes were not achieved, but, informed by observations of the sites over the past seven years, the team can speculate about potential explanations:
Despite the initiative’s failure to improve student outcomes, the sites still use many of the policies, either because they found them valuable or because state law or regulation now requires them. In particular, the sites continue to incorporate systematic teacher evaluation into regular practice and have kept many new recruitment and hiring policies.