Securing AI Model Weights

Preventing Theft and Misuse of Frontier Models

Sella Nevo, Dan Lahav, Ajay Karpur, Yogev Bar-On, Henry Alexander Bradley, Jeff Alstott

ResearchPublished May 30, 2024

As frontier artificial intelligence (AI) models — that is, models that match or exceed the capabilities of the most advanced models at the time of their development — become more capable, protecting them from theft and misuse will become more important. The authors of this report explore what it would take to protect model weights — the learnable parameters that encode the core intelligence of an AI — from theft by a variety of potential attackers.

Specifically, the authors (1) identify 38 meaningfully distinct attack vectors, (2) explore a variety of potential attacker operational capacities, from opportunistic (often financially driven) criminals to highly resourced nation-state operations, (3) estimate the feasibility of each attack vector being executed by different categories of attackers, and (4) define five security levels and recommend preliminary benchmark security systems that roughly achieve the security levels.

This report can help security teams in frontier AI organizations update their threat models and inform their security plans, as well as aid policymakers engaging with AI organizations in better understanding how to engage on security-related topics.

This document was revised in June 2024 to add acknowledgments, correct formatting, and make an addition to Appendix A.

Key Findings

  • AI organizations face a diverse set of threats, across many meaningfully distinct attack vectors and a wide range of attacker capacities.
  • There is rough agreement among cybersecurity and national security experts on how to protect digital systems and information from less capable actors, but there is a wide diversity of views on what is needed to defend against more-capable actors, such as top cyber-capable nation-states.
  • The security of frontier AI model weights cannot be ensured by implementing a small number of "silver bullet" security measures. A comprehensive approach is needed, including significant investment in infrastructure and many different security measures addressing different potential risks.
  • There are many opportunities for significantly improving the security of model weights at frontier labs in the short term.
  • Securing model weights against the most capable actors will require significantly more investment over the coming years.

Recommendations

  • Developers of AI models should have a clear plan for securing models that are considered to have dangerous capabilities.
  • Organizations developing frontier models should use the threat landscape analysis and security level benchmarks detailed in the report to help assess which security vulnerabilities they are already addressing and focus on those they have yet to address.
  • Develop a security plan for a comprehensive threat model focused on preventing unauthorized access and theft of the model's weights.
  • Centralize all copies of weights to a limited number of access-controlled and monitored systems.
  • Reduce the number of people authorized to access the weights.
  • Harden interfaces for model access against weight exfiltration.
  • Implement insider threat programs.
  • Invest in defense-in-depth (multiple layers of security controls that provide redundancy in case some controls fail).
  • Engage advanced third-party red-teaming that reasonably simulates relevant threat actors.
  • Incorporate confidential computing to secure the weights during use and reduce the attack surface.

Order a Print Copy

Format
Paperback
Page count
128 pages
List Price
$42.00
Buy link
Add to Cart

Topics

Document Details

  • Availability: Available
  • Year: 2024
  • Print Format: Paperback
  • Paperback Pages: 128
  • Paperback Price: $42.00
  • Paperback ISBN/EAN: 1-9774-1337-4
  • DOI: https://doi.org/10.7249/RRA2849-1
  • Document Number: RR-A2849-1

Citation

RAND Style Manual
Nevo, Sella, Dan Lahav, Ajay Karpur, Yogev Bar-On, Henry Alexander Bradley, and Jeff Alstott, Securing AI Model Weights: Preventing Theft and Misuse of Frontier Models, RAND Corporation, RR-A2849-1, 2024. As of October 13, 2024: https://www.rand.org/pubs/research_reports/RRA2849-1.html
Chicago Manual of Style
Nevo, Sella, Dan Lahav, Ajay Karpur, Yogev Bar-On, Henry Alexander Bradley, and Jeff Alstott, Securing AI Model Weights: Preventing Theft and Misuse of Frontier Models. Santa Monica, CA: RAND Corporation, 2024. https://www.rand.org/pubs/research_reports/RRA2849-1.html. Also available in print form.
BibTeX RIS

Research conducted by

Funding for this research provided by gifts from RAND supporters. The research was conducted by the Meselson Center within RAND Global and Emerging Risks.

This publication is part of the RAND research report series. Research reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND research reports undergo rigorous peer review to ensure high standards for research quality and objectivity.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.