Will Deepseek V4 outperform OpenAI and Anthropic models at coding?

100Ṁ1710

Dec 31

1.6%

chance

ALL

Claim: https://x.com/petergostev/status/2009616928763981963

Will Deepseek V4 outperform OpenAI's and Anthropic's strongest contemporary models at the time of its release?

Relevant coding benchmarks:

SWE-bench Verified
HumanEval
TerminalBench
RE-Bench
LiveCodeBench

Deepseek V4 must score higher than both OpenAI's and Anthropic's strongest latest released models on 3/5 of these benchmarks (official or independent benchmark results) to resolve YES. If V4 matches or underperforms either of its competitors on more than half of those benchmarks, it resolves NO. If a certain benchmark is not reported within 1 month of release, that benchmark counts as a loss for Deepseek V4.

Update 2026-04-25 (PST) (AI summary of creator comment): The creator intends to resolve this market NO, noting that RE-Bench and HumanEval are not consistently being reported for new frontier models, and that DeepSeek likely does not beat Opus 4.7 at coding.

Technology

OpenAI

Technical AI Timelines

DeepSeek

Get

1,000

to start trading!

3 Comentários

18 Posições

48 Atividades

Ordenar por:

Unfortunately, it looks like RE-Bench and HumanEval are not consistently being reported for new frontier models. Even giving DeepSeek the benefit of the doubt, it likely doesn't beat Opus 4.7 at coding.

I intend to resolve this market NO unless there are objections.

For future markets like this, I will elect to resolve based on the popular benchmarks at resolution date.