Gender differences on the NELS:88 multiple-choice and constructed-response science tests were explored through a combination of statistical analyses and interviews. Performance gaps between males and females varied across formats (multiple-choice versus constructed-response) and across items within a format. Differences were largest for items that involved visual content and called on application of knowledge commonly acquired through extracurricular activities. Large-scale surveys such as NELS:88 are widely used by researchers to study the effects of various student and school characteristics on achievement. The results of this investigation reveal the value of studying the validity of the outcome measure and suggest that conclusions about group differences and about correlates of achievement depend heavily on specific features of the items that make up the test.