Two adjacent ability levels of the Metropolitan Achievement Test (MAT) were administered to 801 third graders in San Jose with order of testing randomly counter-balanced over students and a brief interval between testings. Analysis of test results suggests that the overall measurement error for individuals is substantially larger than reported by the test publisher; and that directional bias, of a magnitude likely to obscure educational growth, appears in aggregate comparisons. If dependence is to be placed upon test scores by a program of scope or importance, it might be necessary for evaluators to undertake extensive and rigorous metric validation of SATs, and if required, reconstruction prior to evaluation. Two less stringent remedies are: (1) decrease the number and increase the width of levels of tests administered, and (2) administer a careful pretest to locate the level of test on which a student would be most likely to achieve a score in the middle of the range. 27 pp. Ref.