The Use and Misuse of Test Scores in Reform Debate
Whenever education is the topic, test scores are almost always part of the
debate. Over the last two decades, test scores have been cited, variously, as
sure and certain signs of U.S. educational decline or as indicators that
particular types of reform are "working." In a recent article, Daniel Koretz
drew upon a body of his own and others' research to sort out how the reform
debate has used test scores, what has really happened to them and why, and what
the answers imply for future debate. According to Koretz, the record shows
"characteristically simplistic use of test score data in the public debate" and
the unfortunate tendency to believe that test scores alone really measure the
effectiveness of education.
Popular perceptions and actual trends in test scores
Achievement-test scores have had a curious history in popular perception:
Perhaps because bad news is news, the impressions persist that test
scores started dropping in the 1960s and kept on falling, that the scores of
minority students have not improved, and that education policy and practice are
primarily to blame. A look at test score data from the past three decades
should modify at least the first two impressions:
Scores on achievement tests did decline in the 1960s and 1970s. Some of
that decline reflected changes in the demographic mix of test takers, primarily
Scholastic Aptitude Test (SAT) takers. Nevertheless, test score data from many
sources show a substantial and remarkably pervasive drop in private as well as
public schools, and in Canada as well as the United States.
Scores stopped declining between around 1974 to 1980. But what has
happened since then is not entirely clear: Scores have improved, but
actual achievement may have improved less than some scores imply.
First, many of the reforms of the 1980s were test based, which put strong
pressure on schools to raise scores. One result was "teaching to the test,"
which inflates scores but may not signal lasting student progress. Second, the
data are not consistent: In particular, many state and local testing programs
have shown a much more substantial upturn than the National Assessment of
Educational Progress (NAEP), which is less susceptible to teaching to the test.
Minority students, particularly African Americans, have made both relative and
absolute gains in test scores. For example, between 1973 and 1986,
mathematics scores of African American 17-year-olds rose enough to narrow the
gap with whites by 28 percent according to the NAEP.
Looking for explanations
In public debate, education policy and practice got most of the blame for the
decline. However, Koretz points out, trend patterns suggest that something
else was at work in both the decline and subsequent reversal. For one thing,
the decline was very consistent and pervasive: What kind of changes in
educational policy and practice might have occurred simultaneously in
the nation's decentralized public school systems--not to mention in private
schools and schools in Canada--that would have affected students of many kinds
and ages in all tested subject areas?
Another thing is timing. Most critics focused on tests administered in high
school (such as the SAT), where the
test-score decline lasted longest. They saw the decline as an indictment of
educational standards and practices during the 1960s and 1970s. Some even
claimed that reforms begun in the Reagan era caused the reversal.
Such conclusions ignore two facts: First, the high school students whose SAT
scores ended the decline in the early 1980s must have done some of their
learning before the 11th grade--so reforms that only began in the Reagan years
could hardly have had much effect that quickly. Second, a look at wider
testing shows that the decline ended earlier for students in lower grades. The
effect followed students born around 1962 and 1963 as they progressed through
grades toward high school. In other words, it was a cohort effect--a
pattern of performance for people born in specific years. It cannot be
explained by something that happened to children across grades in
schools at a particular time.
The cohort pattern suggests that whatever the effect of changes in
schooling, broad changes in society also strongly affected the trends. It is
hard to pin down exactly what the social and educational causes were and how
strong an effect they had. However, it appears that numerous factors made
small contributions to the trends but that many of the commonly suggested
factors did not, in fact, play a role. For example, changes in total
television watching cannot explain the achievement trends, changes in family
composition may have accounted for a moderate share of both the decline and the
upturn, and Title I/Chapter 1 might have contributed to the relative gains of
African American and Hispanic students--but only modestly and for children in
lower grades.
What does this mean for future debate?
Research has not been able to pinpoint the effects of the noneducational
influences. Nevertheless, people have misused test-score data in the debate to
give education a "bad rap." Koretz lists three broad, overlapping kinds of
misuse that should be avoided in honest, future debate:
- Simplistic interpretations of performance trends: These trends should
not be taken at face value, ignoring the various factors that influence them:
for example, demographic changes in test takers or inflation of scores caused
by test-based accountability.
- Unsupported "evaluations" of schooling: Simple aggregate scores are
not a sufficient basis for evaluating education--unless they provide enough
information to rule out noneducational influences on performance. Most
test-score databases do not offer that kind of information.
- A reductionist view of education: Koretz notes that it may be "trite"
but it is true that education is a "complex mix of successes and failures . . .
what works in one context or for one group of students may fail for another."
Unfortunately, that truism is often ignored. For example, in the early 1980s,
when people were reasonably concerned about falling aggregate test scores, they
asked for wholesale changes in policies, without first asking which policies
most needed changing or which students or schools most needed new policies.
RAND policy briefs summarize research that has been more fully documented elsewhere. This policy
brief describes work sponsored by RAND's Institute on Education and Training with funds from a
grant by the Lilly Endowment Inc. Full results were originally published in Daniel Koretz, "What
Happened to Test Scores, and Why?" Educational Measurement: Issues and Practice, Winter 1992, pp.
7-11. With permission from the National Council on Measurement in Education, the article was
reprinted as RAND
RP-278
with the same title. RAND is a nonprofit institution that seeks to improve public policy through
research and analysis. RAND's publications do not necessarily reflect the opinions or policies
of its research sponsors.
R 1700 Main Street, P.O. Box 2138, Santa Monica, California 90407-2138 * Telephone 310-393-0411 *
FAX 310-393-4818 2100 M St., N. W., Washington, D.C. 20037-1270 * Telephone 202-296-5000 * FAX
202-296-7960
RB-8008
Copyright © 1994 RAND
All rights reserved.
Permission is given to duplicate this on-line document for personal use
only,
as long as it is
unaltered and complete. Copies may not be duplicated for
commercial purposes.
Published 1994 by RAND
RAND's Home Page