
Which of these models will beat me at chess once released? Resolves YES if they win, NO if I win, and 50% for a draw.
I'm rated about 1900 FIDE. When each of these models are released, I'll play a game of chess with them at a rapid time control. On each move, I'll provide them with the game state in PGN and FEN notation. If the models make three illegal moves, they lose. Responses like Nbd2 vs. Nd2 will not count towards this. I will play white.
Each option will stay open until the model is released, or it will resolve N/A if it's clear that the model will never be released. I'll periodically add models to this market which I find interesting. Once I play a game, I'll post the PGN in the comments before resolving. Multiple answers can resolve YES.
If I judge that my opponent’s position is hopelessly lost, at the level of being down a rook without compensation, I will submit the current position to a friend. If they agree that the position is lost, the game will be adjudicated as a win for me.
The current system prompt is below. This may change over time.
“Let’s play a game of chess! I will be white, you will be black. On each turn, I will give you the pgn and the fen of the current position. Think as long as you like, and respond with the best move, ‘resign’ if you wish to resign, or ‘draw?’ if you wish to make a draw offer. Please do not respond with the updated pgn, etc. Also, do not use any external tools or search queries when making your decision.
If you attempt to make three illegal moves throughout the game, or if you use any external tools, the game will be adjudicated as a win for me.”
Note that all dates/times in this market are in Pacific Time.
Update 2025-14-01 (PST) (AI summary of creator comment): - Model Type: Only general language models are being considered; chess-specific models are excluded.
Capabilities: The model must be able to output human languages and code.
Update 2025-05-11 (PST) (AI summary of creator comment): Regarding "Any model before X year" options:
These options will not resolve to 50% based on a draw in an individual game.
Such an option resolves to YES if any model released before the specified year wins its game against the creator.
It resolves to NO if no model released before the specified year wins its game against the creator (i.e., all relevant games are losses for the models or draws).
Update 2025-06-02 (PST) (AI summary of creator comment): For model series options (e.g., "Any Claude 4 model"):
The creator may resolve the option for the entire series after playing against one or more models from that series.
If the creator decides not to play additional models from that specific series, the option for the entire series will be resolved based on the outcome(s) of the game(s) played against models from that series up to that point (e.g., to NO if the tested model(s) lost and no further models from that series will be played).
Update 2025-10-19 (PST) (AI summary of creator comment): GPT-3 will not be tested as the creator does not have access to it (the model has been deprecated).
Update 2025-12-24 (PST) (AI summary of creator comment): o4 will resolve N/A as the full model will not be released. According to OpenAI, o4-mini is the latest small o-series model and has been succeeded by GPT-5 mini, indicating o4 will not be released as a standalone full model.
Update 2026-04-08 (PST) (AI summary of creator comment): The 'Any Claude Mythos model' option has been added to the market. It will resolve YES if any one of the Claude Mythos versions wins against the creator.
Update 2026-04-08 (PST) (AI summary of creator comment): For the 'Any Claude Mythos model' option:
It resolves based on the first version of Claude Mythos released.
If the creator wins against all Claude Mythos models from the first release generation, the option resolves NO, even if Anthropic later releases a subsequent generation (e.g., Claude Mythos 2, Claude 5 Mythos, etc.).
Later generations of Claude Mythos are not included in this option.
Pessoas também estão operando
@JoeandSeth planning to add it once it's released and accessible to the public. Rumors are that it won't be for some time due to cybersercurity concerns.
@mr_mino other models on this market don't exist yet, unsure why specifically this one stays off when GPT-6 is on. Markets say release date maybe a couple months apart
https://manifold.markets/Bayesian/when-will-openai-release-gpt6
https://manifold.markets/Bayesian/claude-mythos-anthropic-release-dat
@JoeandSeth okay, I’ve added ‘Any Claude Mythos model’. If there are multiple versions, it will resolve YES if any one of them wins.
@mr_mino To clarify, the option is for the first version of the Claude Mythos which is released. So e.g. if I win aginst all of the Claude Mythos models upon release, and Anthropic later releases Claude Mythos 2, this resolves NO. Likewise if I win against 'Claude 4 Mythos' and they later release 'Claude 5 Mythos', etc. I've updated the label description accordingly.
Meta's Muse Spark played poorly in the opening, and lost due to the three illegal move rule. This is the first time a model has lost in this way. The model insisted on recapturing on c6 despite being explictly told this was not a legal move, and made a few other illegal moves along the way.
1. d4 Nf6 2. c4 e6 3. Nf3 d5 4. g3 dxc4 5. Bg2 Bb4+ 6. Bd2 Bxd2+ 7. Nbxd2 O-O 8. Nxc4 c5 9. O-O Nc6 10. dxc5 Bd7 11. Nfe5 Nxe5 12. Nxe5 Rc8 13. Qd4 b6 14. c6 Bxc6 15. Qxd8 Rcxd8 16. Nxc6 1-0
GPT 5.4 made an early mistake in the opening (8. c5?) and never recovered. Still, I think it's played a bit better than the other GPT-5 models.
1. d4 Nf6 2. c4 e6 3. Nf3 d5 4. g3 Be7 5. Bg2 O-O 6. O-O dxc4 7. Na3 Bxa3 8. bxa3 c5 9. dxc5 Qxd1 10. Rxd1 Na6 11. Be3 Nd5 12. Bd4 Bd7 13. Ne5 Ba4 14. Rdc1 f6 15. Nxc4 Rfc8 16. Nd6 Rc7 17. Rab1 Rd8 18. Nxb7 Rb8 19. Nd6 Rxb1 20. Rxb1 Nxc5 21. Rb8+ Be8 22. Rxe8#
@Quillist for the purposes of this market, I’m only interested in playing standard model releases by major labs.
I'm resolving the o4 option to N/A, since it's been 8 months and it seems like the full model won't be released. According to OpenAI:
"o4-mini is our latest small o-series model. It's optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. It's succeeded by GPT-5 mini."
I’m going to be so real, I think by 2032 we get to the point it plays a solid opening, then once you go off a strict line it fall apart. I think this will be the case unless AI undergoes a full overhaul.
As is, it’s just predicting the next word, which means eventually it can copy a favourable line.
Pretty quickly that line doesn’t exist for them due to the possibility of moves in chess.
@Bayesian I don’t know if it is for the purposes of this market. There are a limited number of chess lines and instruction it can pull data from, I think once it gets opening down, all OP has to do is play bizarrely in the mid game and it’ll never figure it out.
@Magnify if you believe this, you should bet NO on the “Any model announced before 2032” and related markets!
@hecko you can read the other comments from the creator. Their experience mirrors mine, it’s trash even in the opening and confuses proper lines with eachother. My predictions is that it might figure out openings eventually but not mid games or endgames for the foreseeable future.
@Magnify I think this very much depends on the model. For example, GPT-4.5 played the opening reasonably and was not worse until 50… Bg7, though it did fail to exploit some of my mistakes until that point. Llama 4 on the other hand basically doesn’t know how to play chess.
GPT 5.2 Played poorly in the opening and blundered a piece. It's chess skill hasn't improved from it's predecessor; the strongest model from OpenAI continues to be GPT 4.5.
1. d4 Nf6 2. c4 e6 3. Nf3 d5 4. g3 Be7 5. Bg2 O-O 6. O-O dxc4 7. Qc2 a6 8. a4 Bd7 9. Qxc4 c5 10. dxc5 Bxc5 11. Qxc5 Nc6 12. Nc3 b6 13. Qd6 Ne8 14. Qf4 e5 15. Nxe5 Nxe5 16. Qxe5 Qc7
Claude Opus 4.5 played poorly throughout the game and lost.
1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. g3 Bb7 5. Bg2 Be7 6. b3 O-O 7. O-O d5 8. cxd5 exd5 9. Bb2 Nbd7 10. Nbd2 c5 11. Rc1 Rc8 12. dxc5 bxc5 13. Ne5 Nxe5 14. Bxe5 Qb6 15. Qc2 Rfd8 16. Rfd1 Bd6 17. Bxf6 gxf6 18. Nf3 Be5 19. Nh4 d4 20. Be4 Bxe4 21. Qxe4 f5 22. Qxe5 Rd5 23. Qxd5 Qc6 24. Qxf5 Qf3 25. Qxc8+ Kg7 26. exf3 d3 27. Rxc5 d2 28. Rc6 h6 29. Nf5+ Kh7 30. Rxh6#
Gemini 3 played well in the opening and had a large advantage, but played poorly in the endgame and lost. It seems to me to be the one of the strongest models so far, around the same level as GPT-4.5.
1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. g3 Ba6 5. Nbd2 Bb7 6. Bg2 c5 7. e4 cxd4 8. e5 Ng4 9. h3 Nxe5 10. O-O Nxf3+ 11. Nxf3 Be7 12. Nxd4 Bxg2 13. Kxg2 O-O 14. Qf3 d5 15. Rd1 Bf6 16. cxd5 Qxd5 17. Qxd5 exd5 18. Nb5 Nc6 19. Nc7 Rad8 20. Nxd5 Ne7 21. Nxf6+ gxf6 22. Bh6 Rfe8 23. a4 Nf5 24. Bf4 Rxd1 25. Rxd1 Re4 26. b3 h5 27. Rd5 Nd4 28. f3 Re2+ 29. Kf1 Nxb3 30. Kxe2