This commentary appeared in Education Week on February 20, 2002.
Now that President Bush has signed the "No Child Left Behind" Act of 2001, states
will soon be implementing reading and mathematics tests for all students in grades
3-8 and imposing tough sanctions on schools where students do poorly. Will the
strict accountability provisions included in the law promote student achievement
and improve poorly performing schools? Researchers who study test-based accountability
know that the new state systems are likely to produce some less desirable results.
And they know some ways that states can make their systems work better.
What are the likely results? Although there are still many unanswered
questions about high-stakes testing and accountability, there is a body of evidence
drawn from Vermont, Florida, Kentucky, Texas, California, and other states about
what will happen as states implement the new, tougher testing policies.
First, we can expect average scores on these accountability tests to rise
each year for the first three or four years. Teachers and administrators at
both low- and high-scoring schools will shift their instruction in ways that
result in score increases. States that implemented test-based accountability
have all seen their scores rise, and in some cases the increases have been dramatic.
Second, we know that to some extent these large gains will not be indicative
of real gains in the knowledge and skills the tests were designed to measure,
a phenomenon known as "score inflation." There is extensive evidence that students'
scores on high-stakes tests rise faster than their scores on other standardized
tests given at the same time and measuring the same subjects. It does not appear
that students actually know as much as we think they do based only on the high-stakes
test scores. Thus, a likely result of accountability is that the test scores
themselves will be less accurate than they were prior to the addition of high
stakes.
Third, we are likely to see an increase in emphasis on tested subjects and
a decrease in emphasis on subjects that are not tested. When students and schools
are held accountable only for reading and mathematics, class time is taken away
from other subjects, such as writing, social studies, and art. Similarly, in
the subjects that are tested, we should expect a decrease in emphasis on skills
and content that are not covered by the tests. For example, if states adopt
multiple-choice tests (which are the most economical alternative), less attention
may be paid to the elements of reading and mathematics that do not lend themselves
to multiple-choice testing.
Fourth, there is likely to be an increase in undesirable test-related behaviors,
such as narrowly focused test-preparation activities that take time away from
normal instruction, and even cheating.
Fifth, we can expect large annual fluctuations in many schools' scores. Some
schools that make the greatest gains one year will see these gains evaporate
the next year. Schools whose teachers earn large bonuses one year may have stagnant
scores the next, as occurred in California. This volatility in school scores
comes from a variety of factors, including student mobility, measurement error,
and other transitory conditions that affect test scores.
Sixth, the sanctions imposed on low-performing schools will not ensure that
students in those schools are not "left behind." The record of success on the
specific sanctions imposed by the law, including staff reassignment and school
takeover, is mixed. There is no guarantee that students in low-performing schools
will be helped by these policies, and some risk that they will be harmed.
What should states do? A number of steps should be taken to maximize the benefits
and minimize the harm done by test-based accountability. The following recommendations
are not exhaustive, but they address the major concerns we've raised.
As a first step, states need to monitor the extent of score inflation. The
amount of inflation is likely to depend on the specific features of each state's
testing program (for example, whether the same test items are used year after
year). States are required to participate in the National Assessment of Educational
Progress testing in grades 4 and 8 every other year, which provides a starting
point for examining score inflation. States need to establish a plan for studying
the NAEP results and interpreting them at the state level, and they need to
consider supplementing NAEP with alternative measures in other subjects and
at other grade levels.
States should consider expanding "what counts" in the state accountability
system to include more than just reading and math. This could be done by testing
other subjects; the overall testing burden could be limited by varying subjects
and grade levels over time, and by using sampling approaches that do not require
every student to take every test or answer every question. States should also
include measures of what content is taught and how it is taught. This information
reveals otherwise-hidden shifts in practice while sending signals that other
subjects are important.
As a basis for doing more sensitive analyses, states need to create student-information
systems that enable them to link the test scores of individual students over
time. Such data will enable states to track individual student progress, whether
a student remains in the same school or transfers. This type of data is especially
important for understanding what happens to students in low-performing schools.
To help ensure that rewards and sanctions reflect real changes in student
achievement, states should base rewards and sanctions on changes in biannual
averages in scores, rather than on single-year changes. Another promising alternative
to year-to-year comparisons of school-average scores is to adopt value-added
approaches in which students are compared against their own prior scores.
Finally, states should monitor the progress and practices of schools that
are subject to interventions, including staff reassignment or takeover, to ensure
that these changes are resulting in better learning environments for children.
Although the new federal law has many attractive features, it contains inadequate
provisions for review and improvement. To ensure that no child is left behind
and to make test-based accountability work better, we need to study what states
do and how well they succeed. Fifty states will be struggling with these requirements,
and they have been given very little guidance about how to proceed.
One of the good features of the new law is the requirement that states promote
instructional methods that are scientifically based--that is, methods that have
been evaluated and have evidence of success. We believe this same emphasis on
research should be applied to the law itself. Test-based accountability will
work better if we acknowledge how little we know about it, if the federal government
devotes appropriate resources to studying it, and if states make ongoing efforts
to improve it.
Brian Stecher is a senior social scientist in the education program at RAND in
Santa Monica, Calif. He is also a member of the technical-design group advising
the California Department of Education on the development of that state's accountability
system. Laura Hamilton is a behavioral scientist at RAND and a co-director of
the RAND/Spencer postdoctoral program in education policy.
This commentary appeared in Education Week on February 20, 2002.
Now that President Bush has signed the "No Child Left Behind" Act of 2001, states will soon be implementing reading and mathematics tests for all students in grades 3-8 and imposing tough sanctions on schools where students do poorly. Will the strict accountability provisions included in the law promote student achievement and improve poorly performing schools? Researchers who study test-based accountability know that the new state systems are likely to produce some less desirable results. And they know some ways that states can make their systems work better.
What are the likely results? Although there are still many unanswered questions about high-stakes testing and accountability, there is a body of evidence drawn from Vermont, Florida, Kentucky, Texas, California, and other states about what will happen as states implement the new, tougher testing policies.
First, we can expect average scores on these accountability tests to rise each year for the first three or four years. Teachers and administrators at both low- and high-scoring schools will shift their instruction in ways that result in score increases. States that implemented test-based accountability have all seen their scores rise, and in some cases the increases have been dramatic.
Second, we know that to some extent these large gains will not be indicative of real gains in the knowledge and skills the tests were designed to measure, a phenomenon known as "score inflation." There is extensive evidence that students' scores on high-stakes tests rise faster than their scores on other standardized tests given at the same time and measuring the same subjects. It does not appear that students actually know as much as we think they do based only on the high-stakes test scores. Thus, a likely result of accountability is that the test scores themselves will be less accurate than they were prior to the addition of high stakes.
Third, we are likely to see an increase in emphasis on tested subjects and a decrease in emphasis on subjects that are not tested. When students and schools are held accountable only for reading and mathematics, class time is taken away from other subjects, such as writing, social studies, and art. Similarly, in the subjects that are tested, we should expect a decrease in emphasis on skills and content that are not covered by the tests. For example, if states adopt multiple-choice tests (which are the most economical alternative), less attention may be paid to the elements of reading and mathematics that do not lend themselves to multiple-choice testing.
Fourth, there is likely to be an increase in undesirable test-related behaviors, such as narrowly focused test-preparation activities that take time away from normal instruction, and even cheating.
Fifth, we can expect large annual fluctuations in many schools' scores. Some schools that make the greatest gains one year will see these gains evaporate the next year. Schools whose teachers earn large bonuses one year may have stagnant scores the next, as occurred in California. This volatility in school scores comes from a variety of factors, including student mobility, measurement error, and other transitory conditions that affect test scores.
Sixth, the sanctions imposed on low-performing schools will not ensure that students in those schools are not "left behind." The record of success on the specific sanctions imposed by the law, including staff reassignment and school takeover, is mixed. There is no guarantee that students in low-performing schools will be helped by these policies, and some risk that they will be harmed.
What should states do? A number of steps should be taken to maximize the benefits and minimize the harm done by test-based accountability. The following recommendations are not exhaustive, but they address the major concerns we've raised.
As a first step, states need to monitor the extent of score inflation. The amount of inflation is likely to depend on the specific features of each state's testing program (for example, whether the same test items are used year after year). States are required to participate in the National Assessment of Educational Progress testing in grades 4 and 8 every other year, which provides a starting point for examining score inflation. States need to establish a plan for studying the NAEP results and interpreting them at the state level, and they need to consider supplementing NAEP with alternative measures in other subjects and at other grade levels.
States should consider expanding "what counts" in the state accountability system to include more than just reading and math. This could be done by testing other subjects; the overall testing burden could be limited by varying subjects and grade levels over time, and by using sampling approaches that do not require every student to take every test or answer every question. States should also include measures of what content is taught and how it is taught. This information reveals otherwise-hidden shifts in practice while sending signals that other subjects are important.
As a basis for doing more sensitive analyses, states need to create student-information systems that enable them to link the test scores of individual students over time. Such data will enable states to track individual student progress, whether a student remains in the same school or transfers. This type of data is especially important for understanding what happens to students in low-performing schools.
To help ensure that rewards and sanctions reflect real changes in student achievement, states should base rewards and sanctions on changes in biannual averages in scores, rather than on single-year changes. Another promising alternative to year-to-year comparisons of school-average scores is to adopt value-added approaches in which students are compared against their own prior scores.
Finally, states should monitor the progress and practices of schools that are subject to interventions, including staff reassignment or takeover, to ensure that these changes are resulting in better learning environments for children. Although the new federal law has many attractive features, it contains inadequate provisions for review and improvement. To ensure that no child is left behind and to make test-based accountability work better, we need to study what states do and how well they succeed. Fifty states will be struggling with these requirements, and they have been given very little guidance about how to proceed.
One of the good features of the new law is the requirement that states promote instructional methods that are scientifically based--that is, methods that have been evaluated and have evidence of success. We believe this same emphasis on research should be applied to the law itself. Test-based accountability will work better if we acknowledge how little we know about it, if the federal government devotes appropriate resources to studying it, and if states make ongoing efforts to improve it.
Brian Stecher is a senior social scientist in the education program at RAND in Santa Monica, Calif. He is also a member of the technical-design group advising the California Department of Education on the development of that state's accountability system. Laura Hamilton is a behavioral scientist at RAND and a co-director of the RAND/Spencer postdoctoral program in education policy.