Evaluation of the Strength Aptitude Test and Other Fitness Tests to Qualify Air Force Recruits for Physically Demanding Specialties

by Sean Robson, Stephanie Pezard, Maria C. Lytell, Carra S. Sims, John E. Boon, Jr., Jason Michel Etchegaray, Michael W. Robbins, David Schulker, Jerry M. Sollinger, Jason H. Campbell, Anthony Atler, Stephan B. Seabrook, Deborah L. Gebhardt, Todd A. Baker, Erica K. Volpe, Kathryn A. Linnenkohl

This Article

RAND Health Quarterly, 2019; 8(3):8


The Air Force uses the Strength Aptitude Test (SAT) to determine whether recruits meet the fitness levels needed to perform the duties of various Air Force specialties with physical strength requirements. However, the SAT was developed in the early 1980s and has not been revalidated since then. In the interim, the duties associated with many Air Force Specialty Code classifications may have changed, and new ones have been added. These changes require a reevaluation of the SAT's utility and effectiveness for qualifying recruits into these specialties. This study evaluates the status and validity of the SAT in a series of studies and summarizes the studies RAND has completed independently and one study conducted in conjunction with HumRRO, which provided the additional data necessary to develop some courses of action for the Air Force to follow to ensure airmen can meet job-related physical requirements.

For more information, see RAND RR-1789-AF at https://www.rand.org/pubs/research_reports/RR1789.html

Full Text

The Air Force wants to ensure that its recruits have the physical capability to perform the tasks of their duty positions, which can vary depending upon the specific demands of the position. To do so, the Air Force tests recruits' physical abilities as part of the induction process at the Military Entrance Processing Station (MEPS). Since the early 1980s, the Air Force has used the Strength Aptitude Test (SAT) to make this determination. The SAT is a weight-lifting test performed on an incremental lifting machine similar to equipment found in fitness centers. The test requires recruits to lift increasingly heavier weights until they either fail to lift the weight or they meet the weight requirement for their specific specialty.

But the composition of the Air Force has changed over time, as have the duties associated with the various occupational specialties. These changes require a reevaluation of the SAT's utility and effectiveness for qualifying recruits into these specialties. The Air Force asked RAND Project AIR FORCE to first evaluate potential benefits of the SAT and then develop and validate physical performance tests and standards to ensure airmen can perform the physically demanding tasks associated with selected enlisted Air Force Specialty Codes (AFSCs). To achieve these objectives, RAND conducted a series of studies between 2010 and 2015. These studies provide an initial evaluation of the SAT followed by job analyses and multiple validation efforts to determine whether the SAT and related fitness tests effectively indicate recruits' capabilities to perform physically demanding tasks required by AFSCs. Collectively, these studies provide the Air Force with scientifically based courses of action for implementing changes to ensure airmen can meet job-related physical requirements. This article summarizes the studies RAND has completed independently and one study conducted in conjunction with Human Resources Research Organization (HumRRO), which provided the additional data necessary to develop some courses of action for the Air Force to follow. A general outline for establishing test standards is presented in Figure 1.

Figure 1. Process for Establishing Test Standards

Process for Establishing Test Standards

How Do Managers View the SAT?

RAND administered a survey to Career Field Managers (CFMs) to understand how they viewed the value of the SAT as an entry test and whether it should be continued. CFMs establish training, education, and related standards for the career fields they manage. Therefore, understanding their perspective is an important step in evaluating the potential advantages and disadvantages of the SAT. CFMs provided feedback in several areas, including the types of physical abilities required by the specialties they manage; whether the SAT requirements should be raised, lowered, or held constant; and benefits and challenges if the SAT were discontinued.

The survey responses indicated that the majority of CFMs are satisfied with current SAT requirements for the AFSCs they manage. Furthermore, CFMs identified more drawbacks than benefits if the SAT were eliminated. Although the CFMs perceived the SAT to play an important role in qualifying recruits for the AFSCs they manage, we concluded that further research should address the validity of the SAT and evaluate the extent to which the SAT effectively predicts an individual's capability to perform the physically demanding tasks required by assigned AFSCs.

Does the SAT Predict Performance or Injuries?

RAND explored Air Force data from Enlisted Performance Ratings (EPRs) and work-related injuries to ascertain whether the SAT predicted either performance or susceptibility to injury. In our initial analyses, we found that available measures are largely insufficient for conducting the statistical tests needed to evaluate the relationships between SAT scores, performance, and injuries. For example, ratings on both the SAT and EPRs tend to cluster at the high end of the scale, which makes it difficult to identify any potential relationship. Also, changes in how the enlisted population is organized over time complicate the analysis, because some specialties get merged with others. Furthermore, SAT requirements (the minimum required for a given specialty) for some specialties have changed over time, and an individual's physical fitness can also vary over time, as evidenced by changes in SAT scores observed between week-zero and week-eight. Changes in SAT scores between MEPS and Basic Military Training (BMT) week-zero were also observed. With respect to injuries, the data contain very few, given the size of the population. We used injury data collected by the Air Force Safety Center, which may not capture less serious types of injuries for a variety of reasons, including policy guidance requiring base safety officials to conduct an investigation for injuries reported to the Air Force Safety Center (Copley et al., 2010), which may act as a disincentive to reporting less serious injuries. Given the limitations of existing data to evaluate the ability of the SAT to predict important job-related outcomes, we recommended a more comprehensive approach for identifying job-related physical requirements and potential physical fitness tests that could be used at the MEPS to determine the physical readiness of recruits to perform physically demanding job tasks associated with their assigned AFSC.

How Can Tests Be Linked to Physical Performance?

Given recent policy changes that open all assignments to women and the fact that the validity of the SAT has not been rigorously assessed since it was first developed, RAND, in conjunction with HumRRO, developed a methodology to deal with limitations of previous studies to examine the SAT's validity. Validation involves accumulating relevant evidence to provide a sound scientific basis for how tests, standards, training requirements, and related personnel decisions are applied. Although several strategies and sources of evidence can be used to establish validity, we evaluated the predictive validity of the SAT by conducting a concurrent, criterion-related validation study. This type of study helps to determine whether higher scores on the SAT are associated with higher physical task performance. In addition to the SAT, we also evaluated other physical tests to determine whether they would have higher validity or could be combined with the SAT to improve decisions about the level of fitness recruits need to perform physically demanding tasks of a given AFSC. The study was designed to answer the following four questions:

  • What are the physical requirements to perform in different AFSCs?
  • How can physical performance on job-relevant tasks be measured?
  • Which physical fitness tests, including the SAT, indicate a recruit's capability to meet job-relevant physical demands?
  • Do the fitness tests predict physical performance equally well for different subgroups (e.g., men and women)?

The approach to answering these four questions consisted of the following tasks, executed jointly by RAND and HumRRO, primarily by HumRRO, or primarily by RAND:

  • Task 1: Identify specific tasks of selected AFSCs to identify the physical requirements to perform in different AFSCs. This task was executed jointly by RAND and HumRRO.
  • Task 2: Develop task simulations that approximate the types of physically demanding tasks performed across AFSCs. These task simulations measure physical performance across four movement patterns required to perform physically demanding tasks across AFSCs: (a) lifting and carrying, (b) lifting and holding, (c) climbing, and (d) pushing and pulling. This task was executed by HumRRO.
  • Task 3: Evaluate the predictive validity of physical fitness tests (for both men and women) to identify which tests can be used to indicate a recruit's capability to meet job-relevant physical demands. This task was executed primarily by RAND.

To accomplish the first task, RAND and HumRRO first analyzed the SAT data to identify career fields for analysis. That analysis showed that 38 percent of AFSCs require an SAT score of 40 pounds, and 26 percent require 70 pounds. Almost all of the men entering the Air Force lift 60 pounds or more, and about 87 percent of the women also do so. Furthermore, analysis of the scores suggests that the SAT begins to make a sizable difference for women at about 70 pounds, with almost all of the men and about 70 percent of women meeting this requirement. Taking these data into consideration along with data suggesting physical training from BMT can increase physical strength, emphasis was placed on the physical demands of AFSCs requiring a 70-pound SAT score or higher.

RAND and HumRRO interviewed CFMs and subject-matter experts for AFSCs requiring that score, asking them to identify the ten most physically demanding tasks and the level of that demand. Through these interviews, the physical demands representative of these AFSCs were identified to form the foundation for developing physical performance measures. Fitness tests were then evaluated by HumRRO to determine which ones could predict physical performance. Specifically, HumRRO identified nine fitness tests, including the SAT, for further analysis, and RAND conducted a series of analyses to develop several possible combinations of these tests (i.e., options) to strengthen the prediction of physical task performance. The tests, in addition to the SAT, are Arm Endurance, Arm Lift, Handgrip, Plank Test, Push-Ups, Sit-Ups, Standing Broad Jump, and Step Test. Each option had advantages and disadvantages. Some would require the purchase of relatively expensive equipment, some would have a greater adverse effect on job opportunities for women, and some offer no gains in validity. RAND assessed the following five options:

  • Option 1: SAT is the only test used (baseline)
  • Option 2: SAT plus any single test
  • Option 3: SAT plus as many other tests as needed
  • Option 4: SAT plus any single inexpensive test
  • Option 5: SAT plus all inexpensive tests.

The results of the analysis indicate that adding the Arm Endurance test to the SAT adds the most validity of any test. The Arm Endurance test measures the ability of the muscles of the upper body to exert force repeatedly or continuously over a moderate time period. Thus, this test measures anaerobic power and muscular endurance. The test is conducted with a stationary arm ergometer, which resembles bicycle pedals but has handgrips instead of pedals. The individual "pedals" the ergometer with his or her hands for a minute and is scored on the number of revolutions achieved. Using it would require the purchase of an additional piece of equipment but would not require much additional space in the MEPS. It also reduces some of the potential problems of test bias, and it provides a sufficient increase in predictive validity to justify the additional costs of equipment and administering and scoring the tests.


Although analyses consistently found support for the predictive validity of the SAT and the related fitness tests evaluated in the study, some significant limitations should be further addressed during a verification period before full implementation of any new tests or standards. Specifically, HumRRO explored options for recommending updated SAT standards for each AFSC; however, these efforts were unsuccessful due to limitations with the available data collected as part of the study. More specifically, HumRRO was unable to identify an acceptable algorithm to cluster AFSCs into meaningful groups (e.g., low vs. high physical demand) using the survey data collected by RAND. Alternative strategies to establish SAT cut scores were considered but could not be executed due to additional data that would be required from the Air Force specifying minimally acceptable job performance in each AFSC. Such data would allow the Air Force to establish a direct linkage between SAT standards and effective job performance; however, this type of data has not yet been collected by the Air Force. In consideration of these data limitations, RAND provides several courses of action, all of which require maintaining the current standards until additional data can be collected to establish the SAT scores associated with minimally acceptable performance within each AFSC.

Courses of Action (COAs) the Air Force Could Pursue

The research done for this study indicates that the SAT remains a valid measure of a recruit's ability to perform the physical duties of his or her Air Force specialty. However, augmenting the SAT with additional physical test(s) could increase the validity of the testing done at the MEPS. Alternatively, the Air Force could continue administering only the SAT at the MEPS and shift the final determination of physical capabilities to perform the duties of a given AFSC to training (rather than entrance) standards. For each of the COAs, RAND considered several factors, including resource requirements (e.g., costs), how well fitness test scores correlate with performance (i.e., validity), and potential gender test bias. Gender test bias can occur in several ways and, depending on the nature of the bias, test scores may not be a good indicator of a particular subgroup's performance. In the context of physical fitness testing, the presence of test bias could mean a greater proportion of one subgroup (e.g., women) is classified into a specialty for which members cannot perform the physical tasks to an acceptable level. The four COAs we analyzed are as follows:

  • COA #1. Adopt the physical test battery at the MEPS that maximizes validity. The combination of tests that meets this objective includes the SAT, Arm Endurance, Push-Ups, and Handgrip.
  • COA #2. Adopt a physical test battery at the MEPS that maximizes validity with no additional equipment costs; combines Standing Broad Jump with SAT.
  • COA #3. Adopt a physical test battery at the MEPS that maximizes validity with limited additional costs; combines SAT with Arm Endurance test.
  • COA #4. Retain the SAT as the only physical test at the MEPS.

The analysis of the four courses of action appears in Table 1.

Table 1. Advantages and Disadvantages for Each COA

COA Advantages Disadvantages
COA #1. Adopt the physical test battery at the MEPS that maximizes validity. The combination of tests that meets this objective includes the SAT, Arm Endurance, Push-Ups, and Handgrip.
  • Maximizes potential to ensure recruit has the ability to perform physically demanding tasks
  • Provides the most comprehensive assessment of physical fitness, to include combinations of tests measuring muscular strength and muscular endurance
  • No gender test bias indicated
  • Requires additional resources and costs for Handgrip and Arm Endurance
  • May have time and space implications for MEPS
  • Return on investment diminishes for each additional test
  • Evidence on how to combine test scores is limited
COA #2. Adopt a physical test battery at the MEPS that maximizes validity with no additional equipment costs. Combines Standing Broad Jump with SAT.
  • Increases validity beyond the SAT with a test that requires no additional costs and minimal resources to administer
  • Gains in validity over the SAT (+4%) minimal and likely do not justify cost and additional resources to administer
  • Adding in all other no-cost tests still offers limited validity gains over the SAT (+7%)
  • Test may overpredict female performance and underpredict male performance on tasks (potential gender test bias)
COA #3. Adopt a physical test battery at the MEPS that maximizes validity with limited additional costs. Combines SAT with Arm Endurance test.
  • Balances cost and validity gains
  • Validity increases significantly beyond the SAT (+22%)
  • Involves fewer tests
  • Reduces gender test bias compared with using SAT alone
  • Slightly less validity gain than COA #1
  • Increases costs somewhat for equipment, maintenance
COA #4. Retain the SAT as the only physical test at the MEPS.
  • Requires only the SAT test and takes advantage of the relatively strong correlation with physical task performance
  • Requires minimal changes at MEPS
  • Slightly less validity gain than other COAs
  • Potential gender test bias

Implementing a COA

Given the study limitations and potential effect on each AFSC, RAND recommends maintaining the SAT requirements currently in place while following an implementation plan to verify any COA selected by the Air Force. Specifically, we recommend the following steps:

  1. Integrate job analysis physical demand survey items into Occupational Analysis Division's routine surveys of each AFSC. Responses to survey items should be evaluated for differences across subgroups (e.g., location, gender). Periodically verify the accuracy of responses (e.g., weight of equipment) by referencing official documents on the dimensions and weights of equipment, and by directly observing and weighing equipment during site visits.
  2. Provide CFMs and other senior leaders in each AFSC with the SAT requirements summary job analysis data for the AFSCs they manage.
    • Collect feedback and address questions or concerns from CFMs and other senior leaders regarding job analysis survey results.
    • Begin administering any new test(s) (e.g., Arm Endurance) at the MEPS to gather data on new Air Force recruits.
    • Collect data on physical performance of recruits assigned to each AFSC.
    • Use the test data collected from the MEPS and the physical performance data to verify the accuracy of the SAT requirement and to identify other test scores (i.e., requirements) associated with minimally effective task performance for each AFSC.
    • Calibrate and adjust requirements based on feedback and data collected.
    • Establish a system for regular monitoring and updating of test requirements.

RAND recommends that CFMs, Training Pipeline Managers, and Training Cadre review the results from the job analysis survey to identify critical physical tasks that can serve as a foundation for physical standards in technical training (i.e., used in physical task simulations). RAND also recommends implementing a feedback system to monitor whether trainees are meeting these standards. If a certain percentage of trainees (e.g., greater than 5 percent) cannot meet standards, that should trigger a review of the SAT standards for that AFSC. If the SAT requirement is found to be acceptable, an additional physical demands study conducted by the Air Force Fitness Testing and Standards Unit should be initiated. This study should examine the physical requirements of the AFSC and consider whether additional physical ability screening is required during the recruitment phase.

The Air Force may wish to consider whether concentration of physical testing resources to the most demanding occupations would enable their most efficient deployment regardless of the COA chosen. As described in this article, only a subset of AFSCs have physical requirements; therefore, focusing efforts on those AFSCs with the greatest physical demands should result in more fidelity and greater efficiency in the overall process. Finally, the COAs described all include development of a system to ensure that the Air Force continues to update physical requirements along with changes in the Air Force jobs themselves, which is key to maintaining the validity of those requirements and, hence, key to ensuring the requirements are beneficial.


Copley, G. B., B. R. Burnham, M. J. Shim, and P. A. Kemp, American Journal of Preventive Medicine, Vol. 38 (No. 1), 2010, pp. S117–S125.

The research reported here was commissioned by the Air Force's Force Management Policy Directorate (AF/A1P) and conducted by the Manpower, Personnel, and Training Program within RAND Project AIR FORCE.

RAND Health Quarterly is produced by the RAND Corporation. ISSN 2162-8254.