Standardized Tests Can Be Smarter

commentary

(U.S. News & World Report)

Student taking a standardized test

Photo by Sondem/Fotolia

by Laura S. Hamilton and Brian M. Stecher

November 2, 2015

Since the Department of Education released a Testing Action Plan recently that lays out “Principles for Fewer and Smarter Assessments,” much of the discussion in the media has focused on the recommendation that statewide testing be limited to no more than 2 percent of instructional time. While this recommendation would bring some relief to students whose school schedules are interrupted by testing for weeks, policymakers should not focus only on the amount of testing. It's not just “How much testing?” that matters, but “How useful is the testing?” and “What is the purpose of the testing?”

After all, standardized tests can do many things: tell policymakers and families how well students are doing overall; play a role in state and district accountability systems; contribute to teacher evaluations; and inform decision-making about student course placement. Some tests are used in other ways that include teachers adapting day-to-day instruction to meet individual student needs based on each student's test results. But one annual assessment cannot serve all these purposes.

Research by the RAND Corporation and others has shown that the content and format of high-stakes tests influences what is taught as well as how the subject is taught. Therefore, creating a test with high-quality items that measure the full range of college- and career-ready skills — rather than just those skills that are easy to measure quickly and cheaply — is essential. As state, district and school leaders think about how to respond to the testing guidelines, we offer several recommendations to increase the likelihood that assessments will be aimed at improving teaching and learning.

1. Testing time. One of the most prevalent gripes about testing is the time that it takes, ostensibly away from other educational opportunities. That is one reason multiple-choice testing has become so prevalent; such tests can cover a large body of material relatively efficiently. Although multiple-choice tests can be administered quickly, they can result in changes to instruction that emphasize the kinds of lower-level skills the questions typically measure. In fact, a short test that emphasizes low-level skills can do more harm to student learning opportunities than a longer, higher-quality test that requires students to engage in complex problem-solving.

Furthermore, time spent on test preparation does not have to detract from instruction if the test items have educational benefit. For example, Advanced Placement course instruction is often explicitly test-focused but involves activities, such as having students evaluate evidence and synthesize material from various sources to produce a compelling argument, that call on the types of higher-order skills that many college- and career-ready standards emphasize.

One way to reduce testing time is to eliminate the requirement for annual testing in grades three through eight, and instead test in carefully selected grades. The crucial drawback of this approach is that it prevents the calculation of measures of growth from one year to the next. Similarly, testing systems that use matrix sampling (a technique in which all test content is covered by different students answering different, randomly selected questions) accommodate shorter testing times without sacrificing content coverage, but generally eliminate the possibility of creating individual student-level scores.

2. Teaching to the test. Teaching to the test has also been cited as a concern regarding the current testing environment. One feature that can reduce unintended teaching to the test is a lack of predictability in what types of questions are administered and what specific content is assessed. Some consistency in tests over time is necessary to ensure that the meanings of scores don't change, but there should be at least some material that teachers cannot easily predict from one administration of the test to the next.

3. High school testing. Much of the policy discussion has focused on testing in grades three through eight, but high school testing creates some unique challenges and should not be neglected. Policymakers should consider the collective impact and value of all of the tests taken by a college-bound high school student, including state tests, college admissions tests and AP exams, and explore opportunities to consolidate, as some states have done by adopting the ACT or SAT as their statewide test. This not only reduces testing demands, but also provides free admissions testing to students who otherwise might not be able to afford it.

4. Test purpose. A desire to reduce testing time should not lead to the use of a single test for multiple purposes without evidence that the test is suitable for each use. It may be tempting to use scores from the annual state test for purposes that it wasn't designed to support, such as course placement, but if that test was not developed or validated for the reason it is being used, it might not be appropriate. Similarly, before a new assessment is adopted, educators should gather evidence of the quality and validity for its intended use and also be aware that using tests for high-stakes decisions (student promotion, teacher evaluation) can undermine the validity and scores for those uses.

With the recent emphasis on accountability, testing demands have grown, and capping the amount of time students spend testing is a reasonable response to unchecked growth. However, a better response would be to systematically review testing programs, focusing on tests that give the most value, whether by reviewing skills and knowledge or by guiding instruction. It is important to note that a more balanced approach to testing needs to consider not only externally mandated tests, but additional tests selected or administered by districts, schools and teachers. A comprehensive approach to making testing smarter should consider all of these types of tests, and district and school leaders should work with teachers to devise a coherent, sensible testing schedule that promotes rather than hinders improved educational opportunities and outcomes.


Laura S. Hamilton is a senior behavioral scientist and Brian M. Stecher is a senior social scientist at the nonprofit, nonpartisan RAND Corporation.

This commentary originally appeared on U.S. News & World Report on November 2, 2015. Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.