Will Deepseek V4 outperform OpenAI and Anthropic models at coding?
25
100Ṁ1710
Dec 31
1.6%
chance

Claim: https://x.com/petergostev/status/2009616928763981963

Will Deepseek V4 outperform OpenAI's and Anthropic's strongest contemporary models at the time of its release?

Relevant coding benchmarks:

  • SWE-bench Verified

  • HumanEval

  • TerminalBench

  • RE-Bench

  • LiveCodeBench

Deepseek V4 must score higher than both OpenAI's and Anthropic's strongest latest released models on 3/5 of these benchmarks (official or independent benchmark results) to resolve YES. If V4 matches or underperforms either of its competitors on more than half of those benchmarks, it resolves NO. If a certain benchmark is not reported within 1 month of release, that benchmark counts as a loss for Deepseek V4.

  • Update 2026-04-25 (PST) (AI summary of creator comment): The creator intends to resolve this market NO, noting that RE-Bench and HumanEval are not consistently being reported for new frontier models, and that DeepSeek likely does not beat Opus 4.7 at coding.

Get
Ṁ1,000
to start trading!
Ordenar por:

Unfortunately, it looks like RE-Bench and HumanEval are not consistently being reported for new frontier models. Even giving DeepSeek the benefit of the doubt, it likely doesn't beat Opus 4.7 at coding.

I intend to resolve this market NO unless there are objections.

For future markets like this, I will elect to resolve based on the popular benchmarks at resolution date.

I think LiveCodeBench definitely, rest not sure

@clementdupOz DeepSeek wins on LiveCodeBench and was close on SWE-Bench Verified!

© Predita Markets, Inc.Termos de UsoPrivacidade