- Could contemporary ML agents be trained to effectively exhibit intelligent mission-planning behaviors without requiring training data on billions of possible combinations of situations?
- Could machine agents learn strategies against surface-to-air missiles (SAMs) using combinations of striker, jamming, and decoy aircraft? Jammers need to get close enough to SAMs to affect them but remain far enough away that they do not get shot down. Decoys need to distract a SAM from a striker at the right time.
- Could sufficiently generalizable representations be built to capture the richness of the planning problem? Would the lessons learned generalize across changes in threat location, type, and number?
U.S. air superiority, a cornerstone of U.S. deterrence efforts, is being challenged by competitors—most notably, China. The spread of machine learning (ML) is only enhancing that threat. One potential approach to combat this challenge is to more effectively use automation to enable new approaches to mission planning.
The authors of this report demonstrate a prototype of a proof-of-concept artificial intelligence (AI) system to help develop and evaluate new concepts of operations for the air domain. The prototype platform integrates open-source deep learning frameworks, contemporary algorithms, and the Advanced Framework for Simulation, Integration, and Modeling—a U.S. Department of Defense–standard combat simulation tool. The goal is to exploit AI systems' ability to learn through replay at scale, generalize from experience, and improve over repetitions to accelerate and enrich operational concept development.
In this report, the authors discuss collaborative behavior orchestrated by AI agents in highly simplified versions of suppression of enemy air defenses missions. The initial findings highlight both the potential of reinforcement learning (RL) to tackle complex, collaborative air mission planning problems, and some significant challenges facing this approach.
RL can tackle complex planning problems but still has limitations, and there are still challenges to this approach
- Pure RL algorithms can be inefficient and prone to learning collapse.
- Proximal policy optimization is a recent step in the right direction for addressing the learning collapse issue: It has built-in constraints preventing the network parameters from changing too much in each iteration.
- ML agents are capable of learning cooperative strategies. In simulations, the strike aircraft synergized with jammer or decoy effects on a SAM.
- Trained algorithms should be able to deal with changes in mission parameters (number and locations of assets) fairly easily.
- Few real-world data exist on successful and unsuccessful missions. Compared with the volumes of data used to train contemporary ML systems, very few real missions have been flown against air defenses, and virtually all of them were successful.
- For analyses involving the use of large simulations in place of large datasets, the required computational burden will continue to be a significant challenge. The scaling of computational power and time required to train realistic sets of capabilities (dozens of platforms) against realistic threats (dozens of SAMs) remains unclear.
- Developing trust in AI algorithms will require more-exhaustive testing and fundamental advances in algorithm verifiability, and safety and boundary assurances.
- Future work on automated mission planning should focus on developing robust multiagent algorithms. Reward functions in RL problems can drastically change AI behavior in often unexpected ways. Care must be taken in designing such functions to accurately capture risk and intent.
- Although simulation environments are crucial in data-scarce problems, simulations should be tuned to balance speed (lower computational requirements) versus accuracy (real-world transferability).
Table of Contents
2-D Problem State Vector Normalization
Containerization and ML Infrastructure
Managing Agent-Simulation Interaction in the 2-D Problem
Overview of Learning Algorithms