Evaluations of federal programs designed to improve student achievement generally depend on data gathered by the states for school accountability purposes, rather than data specifically designed for program evaluation. In addition, these data are available at the school level but not at the student level. This article first discusses issues related to the quality of school-level data collected as part of state accountability systems, including the reliability and validity of school-level test scores as a measure of the value added by schools to student learning. It then outlines various ways in which school-level data can be usefully analyzed and illustrates the challenges inherent in doing so, including the challenges of aggregating data across states to find an overall program effect. The final section discusses the implications of the arguments presented here for measuring changes in school performance and linking these effects to a specific program. Ultimately, our ability to measure changes in outcomes and link them back to the intervention depends on three factors: (a) identifying a set of activities attributable to the program, (b) measuring the quality of implementation of these activities, and (c) obtaining a valid and reliable measure of the desired outcome. The article makes it clear that none of these is easy to come by.