Agenda: Frontier Model Evaluation Science Day

All times are Eastern Standard Time. Agenda is subject to change.

9–9:30 a.m.
Arrival and Breakfast

Opening Session

9:30–9:45 a.m.
Brief Opening Session
9:45–10:45 a.m.
Speed Meeting Rounds

Breakout Sessions

The breakout sessions will be organized into thematic tracks: Chem-Bio, Loss of Control, Risk-Agnostic methods, Coordination & Collaboration.

Additional rooms will be available for presentations, side discussions, and attendee-led sessions.

Classified breakout meeting space will be available all day at the Secret and Top-Secret levels.

11 a.m.–noon
Breakout Session 1
Noon–12:45 p.m.
Lunch
12:45–1:45 p.m.
Breakout Session 2
1:45–2:45 p.m.
Breakout Session 3

Chem-Bio track sessions will cover:

  • Lessons learned from completed model evals [LLMs and biological design tools (BDTs)]
  • Needs and priorities for the next round of model evals
  • Wet lab and lab automation evals

Loss of Control track sessions will cover:

  • Autonomous replication and adaptation
  • Deception
  • Misuse risks of autonomous capabilities

Risk-Agnostic methods track sessions will cover:

  • Evaluation methods: from red teaming to automated benchmarking
  • Task design and elicitation techniques (e.g. prompt engineering)
  • Ensuring evaluation robustness and validity (e.g. testing a models’ full capabilities)

Coordination & Collaboration track sessions will cover:

  • Policy timelines and deliverables
  • Dangerous capability ‘redlines’ and risk thresholds
  • Addressing information siloing
  • Responsible capability scaling

At the end of each session facilitators will collect two to three key takeaways from the discussion, to be summarized in the closing session.

Closing Session

3–3:30 p.m.
Reviewing the day, next steps

The event will be held under Chatham House Rule.