Measuring teacher effectiveness has long been a complex—and often divisive—issue in the education policy world. Nonetheless, valid measures of effectiveness remain necessary to help policymakers make informed decisions about compensation, tenure, retention, and professional development for educators.
Over the last few years, more than 30 states have dramatically reformed how they evaluate teachers and principals. Most of these new evaluation systems combine data on objective evidence of student learning (such as value-added modeling) with other measures, such as teacher observation ratings and student satisfaction surveys.
However, what remains uncertain is how successfully school districts can use these data to implement a single, combined measure of effective teaching and whether they can distinguish among teachers and principals of varying quality.
The appeal of using a combined measure is that these would be easier to interpret than multiple indicators and could facilitate communication to the public. Combined measures have been used to promote accountability and evaluate performance in numerous sectors, including health care, universities, and local and national governments.
That said, using a combined measure could invite simplistic or misleading policy conclusions if the measure is misinterpreted or poorly constructed. Similarly, a combined measure could be misused if the process of constructing it is not transparent or not based on sound principles.
There is limited empirical research to guide policymakers on how best to develop combined measures of teacher effectiveness, but a new RAND report sheds some light on this challenge.
Researchers examined data from the Measures of Effective Teaching Project to estimate how much unique information resides in each of the indicators of effective teaching and to understand the tradeoffs of different ways of combining indicators. They found that while there are some common aspects to effective teaching, each of the indicators contains a fairly large amount of unique information. This has important policy implications: the more unique information, the more it matters how the indicators are combined into a single measure.
Weighting schemes, which determine the extent to which each indicator is factored into the combined measure, play an important role. Depending on what the goals are for the combined measure, some weighting schemes may be more or less effective at predicting teachers that achieve the goal. If the goal of the combined measure is to only predict a teacher's student achievement gains, then the student achievement gain measure should receive over 80% of the weight. However, a singular focus on student achievement gains could lead teachers to narrowly focus on this aspect of teaching and ignore other valued outcomes. If the goal is for students and teachers to meet a broader set of objectives, then scores on each indicator should be combined with a relatively equal weight. The research also found that an equally weighted composite is more stable year to year.
Creating a combined measure of teaching effectiveness has obvious appeal, since no single method provides a complete picture of a teacher's effectiveness. While clearly there are significant challenges and trade-offs associated with this undertaking, research can help guide the way.
Kata Mihaly is an associate economist at the nonprofit, nonpartisan RAND Corporation.
Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.