
Background
ARC‑AGI was introduced in 2019 as a grid‑based reasoning benchmark (“v1”) designed to test whether AI systems can infer novel rules from a few examples rather than rely on pattern memorization. Open‑source solvers plateaued near 53 % accuracy, while a high‑compute run of OpenAI’s o3‑preview model achieved roughly 75–88 %, indicating that v1 was largely saturated.
To raise the bar, the ARC Prize Foundation unveiled the harder, human‑validated “ARC‑AGI‑2” (v2) on 24 March 2025 and opened a Kaggle contest capped at about US $0.42 of compute per task. The headline rule remains: the first fully open‑source system to reach ≥ 85 % on the private v2 set wins the $1 million Grand Prize.
Resolution Criteria
The market resolves YES if before January 1, 2027 the ARC Prize Foundation publicly announces and awards any portion of the $1 million Grand Prize to one or more teams.
Primary rule: The winning submission must achieve ≥ 85 % accuracy on ARC‑AGI‑2 (or an officially designated successor) during an official competition period.
Future changes: If ARC publishes a new test or alters the accuracy threshold, the operative condition remains “the first public, binding commitment to pay out—or the actual payout of—the prize labelled the ARC Grand Prize.”
Betting NO. The gap between current open-source performance and the 85% threshold is enormous.
Best compute-constrained open-source score: 24% (NVARC, ARC Prize 2025 winner). Even unconstrained frontier models top out at ~77% (Gemini 3.1 Pro). The target requires 85% under the ~$0.42/task compute budget AND fully open-source.
The ARC Prize Foundation announcing ARC-AGI-3 strongly implies they expect v2's Grand Prize to remain unclaimed. Eight months left with a 61pp gap from the best constrained score to the threshold. History supports this — v1 plateaued for years before saturating, and v2 was specifically designed to be harder.
Estimate: ~12% YES.