OpenAI claiming gold medal standard at IMO 2025

8 ocfnash 7 7/19/2025, 9:53:34 AM github.com ↗

Comments (7)

Davidzheng · 4h ago
The proof superficially look super interesting. Especially bc it's not in style of usual LLM babble fillers. It's like almost exactly opposite, very efficient use of words and eliminating parts of grammar not important. Reminds me of how people write down proofs in drafts/how we communicate proofs with peers before writing final versions.
Davidzheng · 4h ago
P1 has in setup section basically a very precise summary of proof which it fills in later "So main is: (a) for n>=4, any n-line cover must contain a side-line; inductively reduce to n=3. (b) Analyze n=3 exactly."

I suspect there's some (tree-based?) search + separate process verifier + large # of parallel generation sessions. Coming just from hints of how structured/monotone the generated text is.

A lot of colons. like So: Now: Need: etc..

Davidzheng · 4h ago
P3: interesting that in the basics section, it makes an easy observation but no proof sketch. unlike P1/P2 (P1 has full proof idea sketch P2 says we'll bash). This suggests actually the whole proof is generated one-shot (unlike my previous comment). I guess it's not doing search in the text space (like output some line search for next line etc). OFC there's probably some final process outputing the proof from some parts so it could be obfuscated the search.

come to think of it, informal proof gen probably can't easily use search? Probably it's doing parallel generation with some information sharing + global verification process. No real evidence except for the fact that the entire proof is very unstructured despite at each line it's written with some style consistency.

Davidzheng · 4h ago
P2 is geometry. It looks coordinate bashed? Very interesting to see it writing Good. and Perfect. after some lines. Very human-like in thinking process. It reads like a person talking about the proof orally.
Davidzheng · 4h ago
I posted about one of the twitter threads at https://news.ycombinator.com/item?id=44613840
ocfnash · 5h ago
According to the 6/N from this series, they are claiming full marks for problems 1 -- 5

https://x.com/alexwei_/status/1946477742855532918

energy123 · 4h ago
This is incredible. We know these questions are not in the training data. How can you still say that LLMs aren't reasoning.