RAND Study Highlights Importance of Securing AI Model Weights; Provides Playbook for Frontier AI Labs to Benchmark Security Measures

For Release

Thursday
May 30, 2024

Amid the rapid advancement of artificial intelligence (AI) and its potential risks to national security, a new RAND study explores how best to secure frontier AI models from malicious actors.

Where most studies have focused on the security of AI systems more broadly, this study focuses on the potential theft and misuse of foundation AI model weights—the learnable parameters derived by training the model on massive data sets—and details how promising security measures can be adapted specifically for model weights.

Specifically, it highlights several measures that frontier AI labs should prioritize now to safeguard model weights: centralizing all copies of weights to a limited number of access-controlled and monitored systems; reducing the number of people with authorization; hardening interfaces against weight exfiltration; engaging third-party red-teaming; investing in defense-in-depth for redundancy; implementing insider threat programs; and incorporating Confidential Computing to secure the weights and reduce the attack surface. None of these are widely implemented, but all are feasible to achieve within a year, according to the report.

“Until recently, AI security was primarily a commercial concern, but as the technology becomes more capable, it's increasingly important to ensure these technologies don't end up in the hands of bad actors that could exploit them,” said Sella Nevo, director of RAND's Meselson Center and one of the report's authors. “Not only does this study offer a first-of-its-kind playbook for AI companies to defend against the most sophisticated attacks, it also strives to facilitate meaningful engagement between policymakers, AI developers, and other stakeholders on risk management strategies and the broader impact of AI security.”

Additionally, the study provides a framework for assessing the feasibility of different attacks based on the resources and expertise available to various types of attackers and proposes a list of security benchmarks to fortify security systems. It pinpoints 38 distinct attack vectors across nine categories, from more mundane threats like basic social engineering schemes to more severe (and rare) measures like a military takeover. It also highlights five operational categories of attacker capabilities, ranging from low-budget amateur individuals to highly resourced nation-states, and assesses the feasibility of each group to successfully execute the identified attack vectors.

For example, an amateur attacker might have a less than 20 percent chance to discover and exploit an existing vulnerability in a model's machine learning stack, but the most sophisticated, highly resourced cyber nation-state would have a more than 80 percent chance at pulling off the same attack.

The study further proposes five security level benchmarks to guide organizations in fortifying their AI systems against potential threats, with each benchmark containing a set of security measures designed to thwart attacks from a specific attacker category. The security levels range from basic, off-the-rack security measures effective only against amateur attacks to advanced network isolation strategies designed to thwart the most capable threat actors.

It is one of the first studies to detail AI-specific benchmarks that organizations can use to assess their current security posture and determine whether additional investments are needed to reach a threshold appropriate for their security goals. It also provides the first set of publicly available recommendations for safeguarding against attacks from the most highly capable nation states, including security measures that aren't yet feasible, but will likely be needed in the future.

Designed to serve as guidelines, these benchmarks should not be seen as requirements or part of a compliance regime. However, the authors note that many are similar to ongoing voluntary policies that AI companies have already implemented to reduce the risk of harms from their models.

“Security, especially in an area as dynamic as AI, is never perfect,” said Nevo. “Even well-resourced organizations are susceptible to breaches, which further highlights the need for continuous vigilance and adaptation in AI security strategies, as well as the need for continued research on the topic.”

“While this report breaks new ground by informing crucial aspects of leading AI companies' security and is especially helpful in assisting how security teams and policymakers think about how to defend model weights against compromise from highly sophisticated attackers, there is more work to be done on this issue and we look forward to working with RAND to build out this core body of knowledge,” said Dan Lahav, cofounder and CEO of Pattern Labs and a report coauthor.

Other authors of the study, “Securing AI Model Weights: Preventing Theft and Misuse of Frontier Models,” are Ajay Karpur, Yogev Var-On, Henry Bradley, and Jeff Alstott. This research was conducted by the Meselson Center, part of RAND's new Global and Emerging Risks Division, and funding was provided by gifts from RAND supporters.

About RAND

RAND is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous.