The Use and Misuse of Test Scores in Reform Debate

Daniel Koretz

Research SummaryPublished 1994

Whenever education is the topic, test scores are almost always part of the debate. Over the last two decades, test scores have been cited, variously, as sure and certain signs of U.S. educational decline or as indicators that particular types of reform are "working." In a recent article, Daniel Koretz drew upon a body of his own and others' research to sort out how the reform debate has used test scores, what has really happened to them and why, and what the answers imply for future debate. According to Koretz, the record shows "characteristically simplistic use of test score data in the public debate" and the unfortunate tendency to believe that test scores alone really measure the effectiveness of education.

Achievement-test scores have had a curious history in popular perception: Perhaps because bad news is news, the impressions persist that test scores started dropping in the 1960s and kept on falling, that the scores of minority students have not improved, and that education policy and practice are primarily to blame. A look at test score data from the past three decades should modify at least the first two impressions:

Scores on achievement tests did decline in the 1960s and 1970s. Some of that decline reflected changes in the demographic mix of test takers, primarily Scholastic Aptitude Test (SAT) takers. Nevertheless, test score data from many sources show a substantial and remarkably pervasive drop in private as well as public schools, and in Canada as well as the United States.

Scores stopped declining between around 1974 to 1980. But what has happened since then is not entirely clear: Scores have improved, but actual achievement may have improved less than some scores imply. First, many of the reforms of the 1980s were test based, which put strong pressure on schools to raise scores. One result was "teaching to the test," which inflates scores but may not signal lasting student progress. Second, the data are not consistent: In particular, many state and local testing programs have shown a much more substantial upturn than the National Assessment of Educational Progress (NAEP), which is less susceptible to teaching to the test.

Minority students, particularly African Americans, have made both relative and absolute gains in test scores. For example, between 1973 and 1986, mathematics scores of African American 17-year-olds rose enough to narrow the gap with whites by 28 percent according to the NAEP.

Looking for Explanations

In public debate, education policy and practice got most of the blame for the decline. However, Koretz points out, trend patterns suggest that something else was at work in both the decline and subsequent reversal. For one thing, the decline was very consistent and pervasive: What kind of changes in educational policy and practice might have occurred simultaneously in the nation's decentralized public school systems—not to mention in private schools and schools in Canada—that would have affected students of many kinds and ages in all tested subject areas?

Another thing is timing. Most critics focused on tests administered in high school (such as the SAT), where the test-score decline lasted longest. They saw the decline as an indictment of educational standards and practices during the 1960s and 1970s. Some even claimed that reforms begun in the Reagan era caused the reversal.

Such conclusions ignore two facts: First, the high school students whose SAT scores ended the decline in the early 1980s must have done some of their learning before the 11th grade—so reforms that only began in the Reagan years could hardly have had much effect that quickly. Second, a look at wider testing shows that the decline ended earlier for students in lower grades. The effect followed students born around 1962 and 1963 as they progressed through grades toward high school. In other words, it was a cohort effect—a pattern of performance for people born in specific years. It cannot be explained by something that happened to children across grades in schools at a particular time.

The cohort pattern suggests that whatever the effect of changes in schooling, broad changes in society also strongly affected the trends. It is hard to pin down exactly what the social and educational causes were and how strong an effect they had. However, it appears that numerous factors made small contributions to the trends but that many of the commonly suggested factors did not, in fact, play a role. For example, changes in total television watching cannot explain the achievement trends, changes in family composition may have accounted for a moderate share of both the decline and the upturn, and Title I/Chapter 1 might have contributed to the relative gains of African American and Hispanic students—but only modestly and for children in lower grades.

What Does This Mean for Future Debate?

Research has not been able to pinpoint the effects of the noneducational influences. Nevertheless, people have misused test-score data in the debate to give education a "bad rap." Koretz lists three broad, overlapping kinds of misuse that should be avoided in honest, future debate:

  1. Simplistic interpretations of performance trends: These trends should not be taken at face value, ignoring the various factors that influence them: for example, demographic changes in test takers or inflation of scores caused by test-based accountability.
  2. Unsupported "evaluations" of schooling: Simple aggregate scores are not a sufficient basis for evaluating education⁠—unless they provide enough information to rule out noneducational influences on performance. Most test-score databases do not offer that kind of information.
  3. A reductionist view of education: Koretz notes that it may be "trite" but it is true that education is a "complex mix of successes and failures . . . what works in one context or for one group of students may fail for another." Unfortunately, that truism is often ignored. For example, in the early 1980s, when people were reasonably concerned about falling aggregate test scores, they asked for wholesale changes in policies, without first asking which policies most needed changing or which students or schools most needed new policies.

Koretz cautions against the temptation to misuse data in these ways as public pressure mounts to improve educational achievement.

Topics

Document Details

Citation

RAND Style Manual
Koretz, Daniel, The Use and Misuse of Test Scores in Reform Debate, RAND Corporation, RB-8008, 1994. As of October 11, 2024: https://www.rand.org/pubs/research_briefs/RB8008.html
Chicago Manual of Style
Koretz, Daniel, The Use and Misuse of Test Scores in Reform Debate. Santa Monica, CA: RAND Corporation, 1994. https://www.rand.org/pubs/research_briefs/RB8008.html.
BibTeX RIS

This publication is part of the RAND research brief series. Research briefs present policy-oriented summaries of individual published, peer-reviewed documents or of a body of published work.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.