Will a multi-agent system have its time horizon evaluated by METR before August 2026?
10
1ká¹€1212
Jul 31
37%
chance

METR's time horizon evaluation: https://metr.org/time-horizons/

Some existing multi-agent systems: GPT-5.2 Pro, Grok 4 Heavy, Gemini 3 Deep Think.

This market doesn't count "regular" models being able to spawn subagents. For example, if the reported evaluated model is Claude Opus 4.6, but the evaluation was made within Claude Code where Claude Opus 4.6 could spawn some Claude Sonnet 4.6 subagents, this does not count for the purpose of this market.

Get
á¹€1,000
to start trading!
Ordenar por:

GPT-5.2-Pro is actually available via API right now which I think should simplify the evaluation process quite a lot

© Predita Markets, Inc.•Termos de Uso•Privacidade