Exploring red teaming to identify new and emerging risks from AI foundation models

Marie-Laure Hicks, Ella Guest, Jess Whittlestone, Jacob Ohrvik-Stott, Sana Zakaria, Cecilia Ang, Chryssa Politi, Imogen Wade, Salil Gunashekar

ResearchPublished Oct 31, 2023

On 12 September 2023, RAND Europe and the Centre for Long-Term Resilience organised a virtual workshop to inform UK government thinking on policy levers to identify risks from artificial intelligence foundation models in the lead up to the AI Safety Summit in November 2023. The workshop focused on the use of red teaming for risk identification, and any opportunities, challenges and trade-offs that may arise in using this method.

The workshop brought together a range of participants from across academia and public sector research organisations, non-governmental organisations and charities, the private sector, the legal profession and government. The workshop consisted of interactive discussions among the participants in plenary and in smaller breakout groups. The views and ideas discussed at the workshop have been summarised in this short report to stimulate further debate and thinking as policy around this topical issue develops in the coming months.

Key Findings

The discussion focused on the following themes associated with the use of red teaming with AI foundation models to identify risks:

  • The term 'red teaming' is loosely used across the global AI community. A crucial first step is to develop a clear and shared taxonomy, along with shared norms and good practice around red teaming, for example, regarding who to involve, how to implement it and how to share findings.
  • Red teaming is one specific tool that is part of the wider risk identification, assessment and management toolbox. It is not a governance mechanism in itself.
  • Red teaming is useful in certain cases, in particular medium-term risks and assessment of known risks. Key limitations of red teaming included identifying unknown or chronic risks.
  • The socio-technical aspect of red teaming – who does it and in what context – must be actively considered. Embedding a diversity of perspectives, with deep understanding of the risks, the domain, and the actors or adversaries, is essential to improve a red team's effectiveness.
  • Specific methods such as red teaming should not be the focal point of mandated risk-management activities. If mandates are put in place, they should instead focus on holistic approaches and risk-management frameworks.

Topics

Document Details

Citation

RAND Style Manual
Hicks, Marie-Laure, Ella Guest, Jess Whittlestone, Jacob Ohrvik-Stott, Sana Zakaria, Cecilia Ang, Chryssa Politi, Imogen Wade, and Salil Gunashekar, Exploring red teaming to identify new and emerging risks from AI foundation models, RAND Corporation, CF-A3031-1, 2023. As of September 4, 2024: https://www.rand.org/pubs/conf_proceedings/CFA3031-1.html
Chicago Manual of Style
Hicks, Marie-Laure, Ella Guest, Jess Whittlestone, Jacob Ohrvik-Stott, Sana Zakaria, Cecilia Ang, Chryssa Politi, Imogen Wade, and Salil Gunashekar, Exploring red teaming to identify new and emerging risks from AI foundation models. Santa Monica, CA: RAND Corporation, 2023. https://www.rand.org/pubs/conf_proceedings/CFA3031-1.html.
BibTeX RIS

Research conducted by

This work was funded by the RAND Corporation and conducted by the Centre for Long-Term Resilience and RAND Europe.

This publication is part of the RAND conference proceeding series. Conference proceedings present a collection of papers delivered at a conference or a summary of the conference.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.