In a recent article, RAND's Daniel Koretz drew upon a body of his own and others' research to sort out how the reform debate has used test scores, what has really happened to them and why, and what the answers imply for future debate.
The record shows "a characteristically simplistic use of test score data," Koretz notes, and an unfortunate tendency to believe that test scores alone really measure the effectiveness of education.
Scores on achievement tests did decline in the 1960s and 1970s. Some of that decline reflected changes in the demographic mix of test takers, particularly in the case of the Scholastic Aptitude Test (SAT). Nevertheless, test score data from many sources show a substantial and remarkably pervasive drop in private as well as public schools, and in Canada as well as the United States.
Scores stopped declining somewhere between 1974 and 1980. The public outcry about declining test scores began in earnest in the early 1980s--ironically after the decline in scores had already ended. The end of the decline appeared in the upper elementary grades in the mid-1970s and reached the high school level around 1980, as those cohorts of students moved through their school careers.
Many scores have increased since then, but the extent of improvement is not entirely clear. Although scores have improved, actual achievement may have improved less than some scores imply. First, many of the reforms of the 1980s were based on tests, a state of affairs that put strong pressure on schools to raise scores. One result was "teaching to the test," which can inflate scores without creating commensurate student learning. Second, the data are not consistent: In particular, many state and local testing programs have shown a much more substantial upturn than the National Assessment of Educational Progress (NAEP), which is less susceptible to teaching to the test.
Minority students, particularly African Americans, have made both relative and absolute gains in test scores. For example, between 1973 and 1986, mathematics scores of 17-year-old African Americans rose enough to narrow the gap with whites by 28 percent, according to the NAEP.
Another element in the pattern is timing. Most critics focused on tests administered in high school (such as the SAT), where test-score decline lasted longest. They saw the decline as an indictment of educational standards and practices during the 1960s and 1970s. Some even claimed that reforms begun in the Reagan era caused the reversal.
Such conclusions ignore two facts: First, the high school students whose SAT scores ended the decline in the early 1980s must have done some of their learning before the 11th grade--so reforms that only began in the Reagan years could hardly have had much effect that quickly. Second, a look at wider testing shows that the decline ended earlier for students in lower grades. The low point followed students born around 1962 and 1963 as they progressed through school. In other words, the trend in scores was a cohort effect--a pattern of performance for people born in specific years. It cannot be explained by something that happened to children across grades in schools at a particular time.
The cohort pattern suggests that whatever the effect of changes in school, broad changes in society also strongly affected the trends. Among these societal factors might be changes in the ethnic composition of the school-age population and in family characteristics.
Even if misperceptions about trends in achievement are corrected, test scores will remain central to the debate. Koretz lists three common misuses of test scores that should be avoided:
"What Happened to Test Scores, and Why?"
Daniel Koretz, RAND/RP-278 (Reprinted from Educational Measurement: Issues and Practice, Winter 1992, pp. 7-11),1994, Bibliog., free.