Agenda: Frontier Model Evaluation Science Day
All times are Eastern Standard Time. Agenda is subject to change.
- 9–9:30 a.m.
- Arrival and Breakfast
- 9:30–9:45 a.m.
- Brief Opening Session
- 9:45–10:45 a.m.
- Speed Meeting Rounds
- 11 a.m.–noon
- Breakout Session 1
- Noon–12:45 p.m.
- Lunch
- 12:45–1:45 p.m.
- Breakout Session 2
- 1:45–2:45 p.m.
- Breakout Session 3
Opening Session
Breakout Sessions
The breakout sessions will be organized into thematic tracks: Chem-Bio, Loss of Control, Risk-Agnostic methods, Coordination & Collaboration.
Additional rooms will be available for presentations, side discussions, and attendee-led sessions.
Classified breakout meeting space will be available all day at the Secret and Top-Secret levels.
Chem-Bio track sessions will cover:
- Lessons learned from completed model evals [LLMs and biological design tools (BDTs)]
- Needs and priorities for the next round of model evals
- Wet lab and lab automation evals
Loss of Control track sessions will cover:
- Autonomous replication and adaptation
- Deception
- Misuse risks of autonomous capabilities
Risk-Agnostic methods track sessions will cover:
- Evaluation methods: from red teaming to automated benchmarking
- Task design and elicitation techniques (e.g. prompt engineering)
- Ensuring evaluation robustness and validity (e.g. testing a models’ full capabilities)
Coordination & Collaboration track sessions will cover:
- Policy timelines and deliverables
- Dangerous capability ‘redlines’ and risk thresholds
- Addressing information siloing
- Responsible capability scaling
At the end of each session facilitators will collect two to three key takeaways from the discussion, to be summarized in the closing session.
- 3–3:30 p.m.
- Reviewing the day, next steps
Closing Session
The event will be held under Chatham House Rule.