Evaluating an Operator Physical Fitness Test Prototype for Tactical Air Control Party and Air Liaison Officers

A Preliminary Analysis of Test Implementation

by Sean Robson, Tracy C. Krueger, Jennifer L. Cerully, Stephanie Pezard, Laura Raaen, Nahom M. Beyene

RAND Health Quarterly, 2019; 8(3):7


The U.S. Air Force asked the RAND Corporation to assist its development and validation of gender-neutral tests and standards for battlefield airmen (BA) specialties. The Air Force has conducted an extensive validation study of occupational relevance of physical fitness tests and standards. Following the fitness test validation study, one enlisted specialty (Tactical Air Control Party [TACP]) and one officer BA specialty (Air Liaison Officer [ALO]) moved forward with an implementation plan to further evaluate a set of recommended tests and continuation standards. This study describes RAND's assistance to the Air Force on two fronts: (1) conducting a preliminary evaluation of potential issues and concerns that might influence implementation effectiveness and (2) developing a framework for evaluating the implementation of occupationally relevant and specific tests and standards. This work provides the foundation for ongoing review and evaluation of Air Force fitness tests and standards, which are designed to ensure that airmen are capable of performing critical physical tasks associated with their assigned specialties.

In January 2013, then–Chairman of the Joint Chiefs of Staff Martin Dempsey and then–Secretary of Defense Leon Panetta issued a memorandum rescinding the 1994 Direct Ground Combat Definition and Assignment Rule, which excluded women from assignment to units and positions whose primary mission is to engage in direct combat on the ground. In the memorandum, Panetta and Dempsey mandated that "[c]urrently closed units and positions will be opened by each relevant Service … after the development and implementation of validated, gender-neutral occupational standards and the required notification to Congress" (Chairman of the Joint Chiefs of Staff, 2013). To comply with this mandate, the U.S. Air Force, with the assistance of the RAND Corporation, established a process in fiscal year (FY) 2012 to identify and validate gender-neutral tests, standards, and physical requirements. This initial effort was followed with additional technical support by RAND in FY 2014 and 2015 for the Air Force's implementation of an extensive criterion-related validation study based on scientific principles. This study led to the recommendation of a ten-component Operator–Prototype Test Battery (O-PTB) for the enlisted Tactical Air Control Party (TACP) specialty and the Air Liaison Officer (ALO) specialty.1 A list of these tests is provided in Table 1.

Table 1. Physical Fitness Tests in the O-PTB

  1. Grip Strength
  2. Medicine Ball Toss (Backwards, SideArm, Overhead)
  3. Three-Cone Drill
  4. Rhomboid Major (RM) Trap Bar Deadlift
  5. Pull-Up Test
  6. Lunges, 50-lb Sandbag
  7. Extended Cross Knee Crunch
  8. Farmer's Carry
  9. Ergometer Row Test (1,000 meters)
  10. Run (1.5 miles)a

The 1.5-mile run test was not administered as part of the Air Force Exercise Science Unit's (AF-ESU's) implementation trips because the Air Force already conducts this test regularly as part of the Tier I Air Force–wide fitness test.

The AF-ESU is now leading efforts to implement these tests. As part of this effort, the AF-ESU developed an implementation, verification, and training (IVT) plan to address several questions such as:

  • Who will administer the tests?
  • How long will it take to administer and take the tests?
  • What is the likelihood of a test-taker sustaining an injury while taking the tests?
  • How many current TACPs and ALOs would be expected to pass the proposed test standards?
  • How much improvement can be expected in test performance as TACPs and ALOs become more familiar with the tests?
  • How well does performance on the test battery differentiate between successful and less successful TACPs and ALOs?
  • What concerns do the different stakeholders have about the tests and standards?

Purpose and Approach of the Research

Even though the AF-ESU prioritized these questions, other short- and long-term issues and concerns should be identified and may need to be subsequently addressed. Consequently, RAND was asked by Air Force Director of Military Force Management Policy, Deputy Chief of Staff for Manpower, Personnel and Services to offer support by designing a preliminary evaluation effort of the implementation of the physical tests and standards being adopted by the TACP and ALO career fields. Our evaluation emphasizes one of the main IVT questions: the concerns of different stakeholders. After reviewing initial results from this evaluation, we developed a broad evaluation framework to identify other possible issues and concerns that may emerge in relation to the implementation of physical tests and standards.

Our approach to achieve these two objectives consisted of four steps:

  1. Identify relevant stakeholders.
  2. Design evaluation instruments to address stakeholder reactions.
  3. Collect and analyze the data.
  4. Develop an evaluation framework for future evaluation efforts.

Step One: Identify Relevant Stakeholders

The Air Force plans to sequence the tests and standards implementation in three waves that correspond to three primary job roles for TACPs and ALOs. Specifically, as tests and standards are adopted, they will first be implemented for TACP and ALO operators, then technical training students, and finally for recruits. At this time, the Air Force is in its first wave of implementation (for TACP and ALO operators). Therefore, the most immediate priority for our preliminary evaluation efforts is to determine stakeholder groups involved in these initial implementation efforts for TACP and ALO operators. Three primary stakeholder groups were identified:

  • TACP and ALO operators, who will be required to take the tests and meet the specified standards
  • physical training leaders (PTLs), who will be responsible for administering and scoring the tests and providing training to other test administrators
  • career field managers (CFMs), who will be responsible for addressing gaps in readiness levels for their specialty and for overseeing whether resource needs are being met across the career field.

Step Two: Design Evaluation Instruments to Address Stakeholder Reactions

Next, we identified several topics that could affect the successful implementation of the O-PTB. Taking into consideration the specific topics most relevant to each stakeholder group and prior research on test-taker reactions, we identified the following primary topics for each stakeholder group:

  • operator perspective (TACPs and ALOs)
    • consistency of test administration
    • knowledge of test performance relative to test standards
    • injury concerns related to the tests
    • experienced levels of frustration in taking the tests
    • global evaluations of the O-PTB, including perceived utility, validity, and fairness
  • PTL perspective
    • quality of training provided to administer and score tests
    • global evaluations of the O-PTB, including perceived utility, validity, and fairness
  • CFM perspective
    • current and future plans for test implementation
    • perceived benefits of implementing a new test battery
    • perceived challenges or drawbacks of implementing a new test battery
    • concerns with the specific recommended test battery
    • specific barriers and concerns for test implementation such as time to administer, potential for injury, cost, fairness, and utility.

To address these topics, we used a mixed-methods approach. Specifically, we developed evaluation surveys for TACP and ALO operators and for PTLs and conducted semistructured interviews with CFMs.

Step Three: Collect and Analyze the Data

Evaluation surveys were administered to TACPs, ALOs, and PTLs during the AF-ESU implementation trips to different installations. For each implementation trip, the AF-ESU trained the PTLs on how to properly administer and score each test in the O-PTB and then administered tests to a sample of TACPs and ALOs available at the time of the trip.

Evaluation surveys were completed by a total of 198 operators (TACPs and ALOs) and 135 PTLs representing units from 12 different installations. We also conducted semistructured interviews with CFMs both from the TACP and ALO specialties, as well as other BA specialties including Pararescue (PJ), Combat Rescue Officer (CRO), Special Tactics Officer (STO), and Special Operations Weather (SOWT).

Step Four: Develop an Evaluation Framework for Future Evaluation Efforts

Our initial evaluation is focused on topics relevant to immediate implementation priorities of the Air Force, however, we recognize a broader perspective on evaluation is beneficial to identify potential future concerns or issues that emerge as priorities over time (e.g., focus on tests and standards for recruits). Consequently, we present a more comprehensive framework identifying a range of possible topics that are organized in a framework often used in the military for identifying requirements and potential gaps for a given set of strategic objectives.2 Broadly, the objectives of the evaluation framework are to (1) raise awareness of potential challenges and concerns for relevant stakeholders during the implementation and adoption of the new physical tests and standards and (2) promote the development of systematic data collection to monitor progress over time.

Results of Evaluation Surveys and Semistructured Interviews

In the following sections, we present an overview of results from each of the stakeholder groups. We begin by discussing results from the evaluation surveys from the TACP and ALO operators and then PTLs. We conclude with a summary of themes identified during the discussions held with CFMs.

TACP and ALO Survey Responses

Consistency of Test Administration

The pattern of TACP and ALO responses suggested that each test was administered consistently. Specifically, over 90 percent of respondents agreed or strongly agreed that each test was consistently administered. Furthermore, only two tests, Pull-Ups and Extended Cross Knee Crunch, had less than 95 percent agreeing or strongly agreeing that these tests were administered consistently (94 and 92 percent, respectively).

Knowledge of Performance Relative to Standard

TACP and ALOs indicated that they knew how well they performed on each test relative to the required standard established as part of the Air Force validation study. Only the Extended Cross Knee Crunch yielded less than 90 percent of responses agreeing or strongly agreeing, with 9 percent of respondents indicating they neither agreed nor disagreed with the statement.

Injury Concerns for Each Test

Overall, TACP and ALO respondents noted few injury concerns across the majority of the tests, and no new injuries were reported in preparation for or during administration of the tests. However, 20 percent of respondents indicated an injury concern for the Trap Bar Deadlift. Additional analyses indicated no patterns in responses for the Trap Bar Deadlift across subgroups. However, respondents' open-ended comments suggest concerns primarily emphasized the ability to use proper form and technique when performing the lift. Some of the respondents provided specific suggestions to address potential injury concerns, including providing sufficient training on proper form, providing opportunities to practice, and using a weight belt when lifting.

Frustration Experienced

Tests that produced the most frustration included the Extended Cross Knee Crunch (41 percent) and the Medicine Ball Toss (21 percent); these tests demonstrated significantly more frustration compared with other tests in the test battery. We conducted additional analyses for these two tests to determine if any significant patterns emerged by installation and background characteristics.

Although installation was not influential, comparisons of responses by demographic variables revealed that TACPs and ALOs who are heavier, taller, and more experienced were significantly less likely to indicate frustration on the Medicine Ball Toss compared with lighter, shorter, and less experienced TACPs and ALOs. This finding is further supported by a few open-ended comments suggesting that the Medicine Ball Toss appears to favor taller individuals. Other common open-ended comments indicated that some respondents felt that the Medicine Ball Toss required substantial technique and practice to do well, with some stating that the test measures skill/technique more than strength/power.

Global Evaluations of the Test Battery

Several items were used to address operators' perspectives on the perceived utility of the test battery, in addition to perceptions of how well they think the test battery measures important abilities for their job (i.e., face validity). Overall, the general perceptions suggest that the test battery will be positively received by TACPs and ALOs, with 84 percent of respondents indicating that knowing how well they performed on this test battery will help improve job-related physical capabilities; 74 percent indicated that this test measures the abilities required of a TACP.

Although most responses present favorable perceptions, 18 percent indicated that they disagreed or strongly disagreed that the test would be fair for all TACPs regardless of rank, age, stature, gender, or race/ethnicity. Respondents' open-ended comments suggested that the tests and standards might not be relevant to all TACPs given that some assignments do not require any (or as much) physical effort to perform their job tasks.

PTL Survey Responses

PTL Training

We also developed survey items for the PTLs who received test administrator training from the AF-ESU on how to properly administer and score each test. Overall, the PTL responses to the training clearly suggest that the training provided was effective.

Global Evaluations of the Test Battery

PTLs also provided responses to general questions about the prototype test battery. Overall, these responses were also favorable. Ninety-three percent of respondents agreed or strongly agreed that each test was administered to all operators in the same way. Eighty percent of PTLs indicated that the test battery would be fair to all TACPs regardless of rank, age, stature, gender, or race/ethnicity. Some of those who indicated concerns about fairness provided open-ended comments suggesting that fitness requirements may not be relevant across the entire career lifecycle of a TACP. That is, TACPs may be assigned to a position in which fitness is less important, such as a staff position. Other open-ended comments suggest that some of the frustration observed in taking specific tests (e.g., Extended Cross Knee Crunch) may be related to lack of familiarity in taking a new test and a desire to perform well.

Perceived Utility of the Test Battery

The final set of survey items for PTLs focused on the overall utility of the test battery. Responses indicated very positive support with 97 percent agreeing or strongly agreeing that the test battery is a better measure of operational capabilities compared with the Physical Ability and Stamina Test (PAST).3 Furthermore, 94 percent agreed or strongly agreed that operators could really show their physical abilities through this test battery. And, 92 percent indicated that TACPs would find their test results useful for improving their job-related physical capabilities.

CFM Discussions

To gain a better understanding of the BA CFMs' concerns about test implementation, we held three separate meetings with individual and small groups of CFMs, which included CFMs and career field representatives from TACP and ALO; CRO, PJ, and STO; and SOWT. One CFM (Combat Control) was unavailable for these meetings. Although only the TACP and ALO specialties have moved forward with implementation of a new operator test battery, we decided to include other BA CFMs to provide a more comprehensive understanding of the issues, concerns, and barriers influencing the successful implementation of an updated physical test battery. Given the differences in current and planned efforts to implement an updated physical test battery, we present our discussion of the CFM feedback for each topic area grouped by TACP/ALO CFMs and other BA CFMs. Several questions were presented to address the following topics:

  • current and future plans for test implementation
  • perceived benefits of implementing a new test battery
  • perceived challenges or drawbacks of implementing a new test battery
  • concerns with the specific recommended test battery
  • specific barriers and concerns for test implementation such as time to administer, potential for injury, cost, fairness, and utility.

Overall, the CFMs agreed that there are potential benefits gained from the proposed O-PTB; however, other BA CFMs raised concerns about the need for ten tests, the time to administer them, and the potential costs associated with purchasing and maintaining the tests (see Table 2).

Table 2. Summary of CFMs' Perspectives on the Implementation of Tests and Standards

Plans for test implementation
  • Focus is on implementation for current operators
  • Future efforts will consider implementation for technical training students and future recruits
  • General interest and will monitor how well tests and standards are implemented for TACP/ALO
  • No concrete plans to change PF tests and standards
Perceived benefits
  • New O-PTB is more comprehensive and addresses deficiencies in the PAST by measuring job-related agility
  • New O-PTB addresses deficiency in the PAST by measuring muscular power
Perceived challenges
  • Tests and standards may not be equally relevant for TACPs/ALOs at all organizational levels (e.g., staff position)
  • Fear of the unknown (e.g., potential career consequences if operators do not meet standards)
  • Administration time is perceived to take more time and will be more difficult to manage compared with the test battery they currently use for their specialties
  • Additional equipment, which will require more money to purchase and maintain
Concerns with recommended test battery
  • Concern that commanders may overemphasize or underemphasize role of fitness for specialties; emphasized importance of balanced integration of fitness-related workouts and testing
  • Insufficient communication on how final ten PF tests were selected from 39 considered in the validation study
  • Questioned value added with proposed tests and standards
Other comments
  • Communication often gets lost over time and not all operators recall steps taken to develop recommended tests and standards
  • Recognize the value in making improvements, but do not agree that O-PTB is the correct solution


We conducted a preliminary evaluation of potential issues and concerns that may influence implementation effectiveness for TACP and ALO operators and developed a broader framework for monitoring and evaluating the implementation of an occupationally specific PF test battery for TACPs and ALOs. A summary of the main findings and recommendations is provided in Table 3.

Table 3. Summary of Findings and Recommendations

Finding Accompanying Recommendation
1. Overall, TACPs and ALOs indicated consistently strong, positive support for the O-PTB. Communicate results broadly throughout the TACP and ALO community; disseminate results to other BA leaders.
2. TACPs and ALOs generally felt that each test was administered to all operators in the same way and that they knew how they performed relative to the standard. Test administrators were being observed and had just received training on how to administer; therefore, follow-up evaluations should be conducted to ensure consistent administration continues to be followed.
3. TACPs and ALOs indicated concern about the potential for injury for the Trap Bar Deadlift (20 percent), and PTLs expressed concern that operators could be injured while taking the test battery (12 percent). No new injuries reported in preparation for or during administration of the tests. Consider (1) further training on proper form and technique, (2) increasing the opportunities to practice the tests and receive feedback, and/or (3) modifying test administration instructions.
4. TACPs and ALOs were most frustrated by the Extended Cross Knee Crunch (41 percent) and the Medicine Ball Toss (21 percent), and 15 percent of PTLs indicated that operators seemed frustrated by the test. Consider (1) further training on proper form and technique, (2) increasing the opportunities to practice the tests and receive feedback, and/or (3) modifying test administration instructions.
5. TACPs and ALOs (18 percent) and PTLs (12 percent) did not feel that the test battery would be fair for all TACPs regardless of rank, age, stature, gender, or race/ethnicity. TACP and ALO CFMs echoed this sentiment by expressing concern about the community's lack of awareness regarding the scientific validation process to determine the tests included in the battery. Deliver additional communication about the history of the test development process, how tests were selected, and how they link to job and mission-related requirements.
6. TACPs, ALOs, (71 percent) and PTLs (78 percent) felt that it was important that test administrators be other TACPs; in contrast, CFMs emphasized that the test administrator could be anyone. Consider the advantages and disadvantages of various test administrator characteristics.
7. Other BA CFMs (i.e., not TACP or ALO) expressed strong concerns over logistical issues regarding test administration (e.g., time, equipment cost). Examine the time required, on average, to administer the prototype test battery and the cost to purchase all equipment for a squadron of 100.
8. Other BA CFMs (i.e., not TACP or ALO) recognized deficiencies in the PAST and expressed interest in addressing these shortcomings through a more collaborative effort. Increase frequency of communication among CFMs, commanders, strength and conditioning coaches, and the AF-ESU. Consider trade-offs between scientific validity and other career field needs including feasibility, cost, and perceived utility.

Overall, the results from TACPs, ALOs, and PTLs were very positive, with relatively few concerns identified among a minority of TACP and ALO operators who participated in this phase of test implementation. Most indicated that each test was administered correctly and that the test battery measured important job-related physical abilities. Many open-ended comments further supported these findings with positive comparisons to the PAST, which stated that the O-PTB is more comprehensive and more representative of operational tasks than the PAST. Only a few tests caused frustration or raised concerns for potential injury among some of the respondents. Three tests, in particular, warrant close monitoring and further review to identify opportunities to address potential concerns for injury and frustration: the Extended Cross Knee Crunch, Trap Bar Deadlift, and Medicine Ball Toss. For each of these tests, we recommend additional training, practice, and feedback on the proper form and technique, which should help address these concerns.

Analysis of open-ended comments in combination with CFM feedback suggests that additional structured communication could help address a range of concerns (e.g., career repercussions if operators fail a test) and to further educate the operators on the purpose of the test battery and the steps that were taken to inform the decision to select the final ten tests and standards. Consequently, we recommend developing additional communication channels such as short, informative pamphlets that can be used to answer frequently asked questions. For example, operators could benefit from additional education on the role of each test in assessing their physical capability to perform critical physical tasks of their specialty. Operators could also benefit by having additional information on the implementation plan, timeline, and potential changes that may be made to the protocols for specific tests (e.g., Farmer's Carry).

Although the results were very positive, our analysis represents only a snapshot of perceptions at a given point in time. As the implementation proceeds for TACP and ALO operators, we recommend following up with future evaluations to determine if perceptions have changed. Future evaluations should be conducted annually for the first two years of implementation and then every three to five years thereafter. Tracking individual operators over time using pre- and post-test research designs will allow for more sophisticated analyses for evaluating the effectiveness of different interventions (e.g., additional training on form/technique) to address concerns. Additional interviews, focus groups, and surveys of stakeholders can also help to determine whether new issues or concerns have emerged.

Finally, we recommend expanding these evaluation efforts to consider additional stakeholder groups and topics. To assist in identifying other stakeholder groups and topics, we developed a framework guided by the DOTMLPF-P structure. This framework could be used to raise awareness of possible issues that may influence the successful implementation of PF tests and standards and to guide evaluation efforts for determining how well implementation objectives are being met.


Chairman of the Joint Chiefs of Staff, "Women in the Service Implementation Plan," Washington, D.C. January 9, 2013.


  • 1 Because recruits and trainees already have a set of tests and standards in place for TACP and ALO, the Air Force prioritized the implementation of tests and standards for existing TACP and ALO operators. Future plans may consider updating the tests and standards used for recruits and trainees as well as other battlefield airmen (BA) specialties.
  • 2 Doctrine, Organization, Training, Materiel, Leadership and Education, Personnel, and Facilities and Policy (DOTMLPF-P) is defined in the Joint Capabilities Integration Development System.
  • 3 We use the PAST as a point of comparison because many of its test components (e.g., pull-ups, push-ups, sit-ups, timed runs) are commonly used across the BA specialties at different career stages to include recruits, trainees, and operators. At a given career stage and/or for each BA specialty, however, there are variations in the standards (e.g., time, distances), ordering of test administration, and/or specific tests (e.g., inclusion of swimming tests).

The research described in this article was cosponsored by the Air Force Director of Military Force Management Policy, Deputy Chief of Staff for Manpower, Personnel and Services, the Vice Commander in Air Education and Training Command, the Vice Commander in Air Force Special Operations Command, and the Directorate of Air and Space Operations and conducted by the Manpower, Personnel, and Training Program within RAND Project AIR FORCE.

RAND Health Quarterly is produced by the RAND Corporation. ISSN 2162-8254.