Understanding the Limits of Artificial Intelligence for Warfighters

Volume 2, Distributional Shift in Cybersecurity Datasets

Joshua Steier, Erik Van Hegewald, Anthony Jacques, Gavin S. Hartnett, Lance Menthe

ResearchPublished Jan 3, 2024

The Department of the Air Force has become increasingly interested in the potential for artificial intelligence (AI) to revolutionize different aspects of warfighting. For this project, the U.S. Air Force asked RAND Project AIR FORCE to consider broadly what AI cannot do—to understand the limits of AI for warfighting applications. This report presents a discussion of the application of AI systems to perform two common cybersecurity tasks—detecting network intrusions and identifying malware—and the effect of distributional shift on those tasks, a phenomenon that can significantly limit AI effectiveness. Distributional shift occurs when the data that an AI system encounters after it is deployed differ appreciably from the data on which it was trained and tested.

This report describes the importance of distributional shift, how it can and does significantly limit AI effectiveness in detecting network intrusions and identifying malware, how to test for and quantify its effects, and how those effects could be mitigated. This work is aimed primarily at larger organizations, such as headquarters facilities, that have the bandwidth and computing power to implement AI-enabled cybersecurity systems and to update their systems regularly.

This report is the second in a five-volume series addressing how AI could be employed to assist warfighters in four distinct areas: cybersecurity, predictive maintenance, wargames, and mission planning. This volume is intended for a technical audience; the series as a whole is designed for those who are interested in warfighting and AI applications more generally

Key Findings

  • Cybersecurity datasets suffer from distributional shift, especially in standard network intrusion detection and malware classification.
  • Distributional shift can be characterized in multiple ways, and the ease of detection depends on the dataset.
  • Although data quality is important in training machine-learning algorithms, the recency of the data is also significant.
  • Cases in which data must be recent to be useful limits the data available for training, which in turn bounds AI performance.

Recommendations

  • Dataset segmentation tests should be performed for any AI-based cybersecurity system to assess the likely significance of distributional shift on performance over time. These tests can be used to estimate a data decay rate, which in turn can be used to yield an estimate of the likely shelf life of an AI system before it must be completely retrained.
  • It is also recommended that well-known statistical tests, such as the Kolmogorov-Smirnov test, be performed on the dataset as an additional measure to detect or confirm distributional shift.

Order a Print Copy

Format
Paperback
Page count
35 pages
List Price
$23.00
Buy link
Add to Cart

Topics

Document Details

  • Availability: Available
  • Year: 2024
  • Print Format: Paperback
  • Paperback Pages: 35
  • Paperback Price: $23.00
  • Paperback ISBN/EAN: 9781977412799
  • DOI: https://doi.org/10.7249/RRA1722-2
  • Document Number: RR-A1722-2

Citation

RAND Style Manual
Steier, Joshua, Erik Van Hegewald, Anthony Jacques, Gavin S. Hartnett, and Lance Menthe, Understanding the Limits of Artificial Intelligence for Warfighters: Volume 2, Distributional Shift in Cybersecurity Datasets, RAND Corporation, RR-A1722-2, 2024. As of October 11, 2024: https://www.rand.org/pubs/research_reports/RRA1722-2.html
Chicago Manual of Style
Steier, Joshua, Erik Van Hegewald, Anthony Jacques, Gavin S. Hartnett, and Lance Menthe, Understanding the Limits of Artificial Intelligence for Warfighters: Volume 2, Distributional Shift in Cybersecurity Datasets. Santa Monica, CA: RAND Corporation, 2024. https://www.rand.org/pubs/research_reports/RRA1722-2.html. Also available in print form.
BibTeX RIS

Research conducted by

This research was prepared for the Department of the Air Force and conducted within the Force Modernization and Employment Program of RAND Project AIR FORCE.

This publication is part of the RAND research report series. Research reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND research reports undergo rigorous peer review to ensure high standards for research quality and objectivity.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.