Cover: Risk Assessment of Reinforcement Learning AI Systems

Risk Assessment of Reinforcement Learning AI Systems

Looking Beyond the Technology

Published Jul 2, 2024

by Kyle Bunch, Alexander C. Hou, Ryan Haberman, Marissa Herron, Anthony Jacques, Gary J. Briggs


Download eBook for Free

FormatFile SizeNotes
PDF file 1.8 MB

Use Adobe Acrobat Reader version 10 or higher for the best experience.


Purchase Print Copy

 Format Price
Add to Cart Paperback100 pages $28.00

This report presents some of the challenges that the U.S. Department of Defense (DoD) may face in fielding an artificial intelligence (AI) technology called reinforcement learning (RL) in DoD applications. RL has been credited with expanding the decisionmaking ability of machines beyond that of humans in playing complex games of strategy. The fact that RL-enabled systems can beat world experts in these games raises the question of whether such systems could outperform humans in DoD applications. Especially relevant are "broad" applications having large, complex processes with multiple steps leading to few but consequential decisions for a military commander. Timely alternatives could lead to decisive advantages in such situations. What is not clear, however, is what risks such a system would introduce from a technical standpoint (i.e., technical failure leading to mission failure) or the risks to the force structure incurred in absorbing such technology. This report represents a first step toward understanding such risks associated with employing RL-enabled systems for operational-level command and control.

Key Findings

  • DoD is likely limited in the use and development of RL because of the lack of specialized skill sets in this field and the difficulty in retaining personnel with such skills once obtained, given the highly competitive and lucrative nature of the field.
  • The high data demands that scale with the size of the RL application may outstrip DoD’s ability to train applications beyond ones that are narrower in scope.
  • Issues coming from the black box decisional nature of RL and the reluctance of humans to trust the unintuitive nature of such systems may limit the size of applications to those that encompass processes currently performed by humans. Larger processes that humans cannot reasonably evaluate are likely to face issues with trust.
  • RL has many additional challenges that expand as applications are broadened, including the growth of training sets and simulation models and the complexity of precisely defining RL training. While many solutions in the literature target individual challenge areas, the solution for all the challenges likely to exist in a broad DoD application was not found.


  • DoD should explore ways to attract, train, and retain a workforce with the skill sets needed in AI.
  • DoD should develop ways to access and generate high-quality data relevant to DoD problems and required in training for RL algorithms.
  • Before being able to leverage the advantages of RL, DoD should better understand the limits of RL's application and how it provides an advantage over existing technology.
  • DoD should consider leveraging incremental advances possible with narrow AI applied to smaller problems rather than initially pursuing potential advantages offered by broad AI applied to more-complex problems. Such an approach may offer less risk while offering a means to bootstrap training for broader AI.

This research was sponsored by the Office of the Under Secretary of Defense for Research and Engineering and conducted within the Acquisition and Technology Policy Center of the RAND National Security Research Division (NSRD).

This report is part of the RAND research report series. RAND reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND reports undergo rigorous peer review to ensure high standards for research quality and objectivity.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.