Using Classroom Artifacts to Measure Instructional Practices in Middle School Mathematics

A Two-State Field Test

Published in: CSE Report, no. 662 (Los Angeles, CA : CRESST, National Center for Research on Evaluation, Standards, and Student Testing, UCLA Center for the Study of Evaluation, Dec. 2005), 53 p

Posted on on December 01, 2005

by Brian M. Stecher, Alice C. Wood, Marylou Gilbert, Hilda Borko, Karin L. Kuffner, Suzanne C. Arnold, Elizabeth H. Dorman

The purpose of this research is to determine whether we can use classroom artifacts as the basis for making valid judgments about the presence of reform-oriented teaching practices in middle-school mathematics classes. Our approach compares ratings based on collections of artifacts assembled by teachers according to our directions (the "Scoop Notebook") with judgments based on direct classroom observation of these teachers, direct observation supplemented by artifacts, and transcripts of discourse recorded during classroom observations. Eleven dimensions of reform-oriented practice were identified for use in this investigation, and each was rated on a dimension-specific five-point scale. Data to answer questions about the reliability and validity of judgments based on the Scoop Notebook are drawn from a field study of 36 middle-school mathematics teachers in two states conducted in Spring 2003. Notebooks were rated independently on each of 11 dimensions by at least three raters who had no prior knowledge of the classroom. In addition, each teacher was observed on two or three occasions during the Scoop period by a researcher who rated each lesson on the same 11 dimensions. At a later time, the observer also reviewed the Scoop Notebook and assigned a "gold standard" rating reflecting all the information available from the Notebook and the classroom observations. For a subset of classrooms, the observed lessons were audiotaped and transcribed, and one researcher with no prior knowledge of the classrooms assigned ratings on the basis of an analysis of the lesson transcripts. Results indicate that the notebooks could be rated with acceptable reliability and that the notebook scores provided a reasonable estimate of the scores obtained by direct observation and by observation supplemented with the review of artifacts. Notebook scores also differentiated between teachers known to be using reform curricula and those known to be using traditional curricula. However, the reliability and validity were not high enough to justify using the Scoop Notebook for making judgments about individual teachers. The following are appended: (1) Reflection Questions in the Scoop Notebook; and (2) Dimensions of Reform Practice Used for Observation Ratings. (Contains 17 tables.)

