Research indicates that teachers affect student achievement more than any other school-related factor. However, value-added modeling, which measures teacher performance based on student test results, has revealed a high degree of variation among teachers. Policymakers have responded with a strong focus on reducing variability as a way to improve teaching.
One expression of this policy focus has been the adoption of structured observation protocols for assessing how teachers provide lessons to their students. These protocols also offer the opportunity to provide teachers with valuable feedback on how their teaching practices could be improved. One such protocol, Classroom Assessment and Scoring System for Secondary Students (CLASS-S), defines ten dimensions of effective teaching that attempt to capture three areas of lesson delivery:
- lesson content
- classroom organization and student engagement
- quality of student support and feedback provided by the teacher.
Scores are collected in multiple classrooms or sections of students, during multiple measurement sessions conducted by multiple raters. This is intended to improve reliability and provide teacher scores that best capture teacher performance.
Past studies of these protocols haven't completely separated situational sources of variation that arise when teachers are rated. For example, when an individual rater observes a teacher instructing a specific group of students, on a specific day, this is a unique event that can introduce variation at multiple levels.
A key question about the use of protocols like CLASS-S is whether this situational variance affects the “intrinsic” scores that are supposed to reflect teachers' consistent performance.
My colleagues and I were able to filter out situational variation in two recent studies—one of more than 900 middle school teachers (PDF), another of roughly 80 middle and high school algebra teachers (PDF)—both of which employed a Bayesian hierarchal model. This unique approach allowed us to discover intrinsic scores for teachers that better capture their performance.
When we looked at the ten dimensions measured by CLASS-S, we found teachers tended to either score well on all ten or poorly on all ten and that these scores were driven by a single root cause and that single root cause can explain a large portion of the variance among teachers.
Future research might also examine how approaches to lesson preparation and other outside-of-the-classroom activities influence teachers' value-added scores.
Of course, even as experts continue to refine value-added modeling and measures of effective teaching become more reliable, policymakers must remember that value-added estimates enable relative judgments but are not absolute indicators of effectiveness—they're one piece of a very complex puzzle.
Terrance Dean Savitsky is an associate statistician at the nonprofit, nonpartisan RAND Corporation. He will present at the Society for Research on Educational Effectiveness (SREE) Spring 2013 Conference in Washington, D.C. on Thursday, March 7.
Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.