In this study, the authors investigated 3 factors that may contribute to the large variation in student performance across open-ended measures. These factors are content domain, format (whether the task required only pencil and paper or involved a hands-on manipulation of equipment), and level of inquiry (whether the task guided the student toward the solution or required the student to develop a solution strategy). A group of 6 similar investigations of acids and bases were developed from a common shell that controlled for format and level of inquiry. Students completed 2 of these tasks as well as tasks drawn from other content areas and a multiple-choice test of science. Results did not bear out the hypothesis that tasks that were similar to each other in content, level of inquiry, and format would correlate higher with each other than with measures that differed on these dimensions. Post hoc analyses of the tasks revealed unanticipated differences in developers' interpretation of the shell that may have affected student performance. Implications for large-scale use of performance measures are discussed.