Will Anthropic’s next Sonnet model exceed 65% on terminal bench?
8
100Ṁ4771
Dec 31
11%
chance

Will be looking toward https://www.tbench.ai/ for evals, using the terminus 2 scaffolding.

Only counts if the number in the model’s name increments, so a new Claude Sonnet 4.5 checkpoint does not count.

If a new Sonnet model is not released by 2027 this will resolve NA

Get
Ṁ1,000
to start trading!
Ordenar por:

@JaundicedBaboon I don't think anybody is going to test Sonnet 4.6 on Termial Bench, Anthropic had Sonnet 4.6's Terminal-Bench 2.0 score at 59.1%, but nobody has submitted its results to the leaderboard yet. I don't know if you think Anthropic's results are good enough of if you want to continue waiting for somebody to submit results to the leaderboard.

© Predita Markets, Inc.Termos de UsoPrivacidade