R-Zero: Self-Evolving Reasoning LLM from Zero Data

35 lawrenceyan 7 9/10/2025, 2:02:17 AM arxiv.org ↗

Comments (7)

jasonjmcghee · 5h ago
Conceptually, it's effectively a GAN
magicalhippo · 12m ago
For those not in the know, that's Generative Adversarial Networks[1], where two neural networks are trained in a competitive way.

One network typically generates tasks for the other, and is rewarded if it manages to make the other network fail the task. The other network is rewarded if it successfully completes the task.

Thus the adversarial network tries to find weaknesses to exploit, and the combined training makes the solving network much stronger. Or at least that's the idea.

[1]: https://en.wikipedia.org/wiki/Generative_adversarial_network

thom · 2h ago
For values of zero quite far above zero.
falcor84 · 1h ago
What am I missing? From my skimming, there's zero external data beyond what is needed for the Challenger to generate questions.
thom · 7m ago
An existing trained LLM is an enormous amount of 'data' however it might be encoded. AlphaZero didn't start with Stockfish or a database of games.
magicalhippo · 53s ago
[delayed]
cyberge99 · 5h ago
What could go wrong?
magicalhippo · 15m ago
Just don't hook it into the nuclear missile controls. We've seen[1] how that goes[2].

[1]: https://en.wikipedia.org/wiki/Colossus:_The_Forbin_Project

[2]: https://en.wikipedia.org/wiki/The_Terminator