Measuring Teacher Effectiveness

Understanding Common, Uncommon, and Combined Methods

Research has validated the widespread belief that effective teaching matters. But what does effective teaching look like? And how can we measure it? Education practitioners, policymakers, and researchers have suggested a wide range of methods. Many of these have been incorporated into teacher feedback and evaluation systems.

  • Three methods are used most widely in the United States.

    The three most widely used measures in the United States are structured classroom observations, teacher contributions to student achievement growth, and student perceptions of teacher effectiveness and classroom instructional climate. According to the National Council on Teacher Quality, as of 2019, 44 states require classroom observations, 33 require measures of student achievement growth, and seven require student surveys as components of teacher feedback and evaluation systems. An additional 24 states permit the use of student surveys in teacher evaluation.

  • There are many other possible measures that require more research attention.

    Teaching effectiveness can also be inferred from tests of teachers’ knowledge or skills; teachers’ participation in professional development, committees, or mentoring; instructional artifacts, including lesson plans and assignments; teacher self-reporting, including instructional logs; and input provided by parents, peers, or administrators.

    However, these measures have not been as thoroughly examined by researchers as have test-based and observation methods. Because of this, they are less frequently incorporated into teacher evaluation systems.

  • How to combine measures may be as important as which measures to combine.

    Education agencies use a wide range of models to combine multiple measures into a single teacher effectiveness appraisal. However, research on combining measures is evolving as a field, with no clear consensus about the best way to do this to achieve an accurate overall rating of teaching effectiveness. Many states and districts use weighted averages to combine information from multiple sources into a single score. Others use discrete decision rules for each measure and then specify an overall decision rule based on those separate decisions. However, combining measures does not always produce more-precise estimates of teacher quality or improve decisionmaking, and measurement error from different sources does not “cancel out” when combining measures. Researchers are exploring ways in which different approaches to combining measures affect the accuracy, consistency, and fairness of evaluation scores.

  • Multiple measures provide a more accurate picture of teaching effectiveness than any single measure.

    Deciding between methods for measuring teacher effectiveness requires trade-offs in terms of accuracy, precision, implementation challenges, and scalability. Each provide a different perspective on teaching—for example, test scores tell us something about the outcomes attained by students in a teacher’s class but do not tell us whether students enjoy being in class or whether teachers’ instructional practices are aligned with district guidelines. Student surveys and classroom observations, respectively, can shed light on those topics. No single method provides a complete picture of a teacher’s effectiveness; fair, accurate, and actionable appraisals of teaching quality depend on having information from multiple, complementary sources of information.