R-Zero: Self-Evolving Reasoning LLM from Zero Data
3 Anon84 1 8/10/2025, 3:25:01 AM arxiv.org ↗
Comments (1)
vineethy · 2d ago
Interesting twist on automated curriculum learning. This paper is using an LLM for the environment and the policy. Other papers use LLMs for policy/value fn. Would be cool to see other reward strategies tying all these threads together