Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper? | Axel

Em alta Urgente Novo

Política Esportes Cripto Finanças Geopolítica Resultados Tecnologia Cultura Mundo Economia Eleições

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

9

410Ṁ599

2050

37

esperado

1H

6H

1D

1W

1M

ALL

This question is meant to measure the gap between solving the main math-based benchmarks at the time of market creation, and contributing to real world mathematics.

The co first author requirement is loose: I will also accept an AI being credited with significant contributions to both deciding what to prove and the actual proof (merely contributing to the proof is not enough - I am trying to get at "the AI does the work of a mathematician" not "the AI does the work of a proof assistant"). I would also accept, for instance, the human author of the paper expressing that they would have named the AI as a coauthor if it was human, or saying that the result could not have been obtained without the assistance of the AI.

Technical AI Timelines

Get

1,000

to start trading!

Ordenar por:

In a lot of pure math, author order is arbitrary/alphabetical. Removing that, I second that it'll be 0. Maybe negative.

I think it is plausible that it will be <0

People already list ChatGPT as a coauthor in scientific papers but not in math yet.

Pessoas também estão operando

By 2030, AI can autonomously prove mathematical theorems that are publishable in mathematics journals today?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

Will an AI co-author a mathematics research paper published in a reputable journal before the end of 2026?

Will AI models solve at least 2 FrontierMath Open Problems before 2027?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Which MATH-AI 23 works will have >50 Google Scholar citations by end of 2026?

Will AI contribute as much as a co-author would today to a real research mathematics paper before Jan 1 2028?

Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Perguntas relacionadas

By 2030, AI can autonomously prove mathematical theorems that are publishable in mathematics journals today?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

Will an AI co-author a mathematics research paper published in a reputable journal before the end of 2026?

Will AI models solve at least 2 FrontierMath Open Problems before 2027?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Which MATH-AI 23 works will have >50 Google Scholar citations by end of 2026?

Will AI contribute as much as a co-author would today to a real research mathematics paper before Jan 1 2028?

Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

© Predita Markets, Inc.•Termos de Uso•Privacidade