> To overcome the[] limitations [of reinforcement learning and instruction distillation], we introduce REverse-Engineered Reasoning (REER), a new paradigm[:] instead of building a reasoning process "forwards" through trial-and-error or imitation, REER works "backwards" from known-good solutions to computationally discover the latent, step-by-step deep reasoning process that could have produced them
> Using this scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks. Our model, DeepWriter-8B, trained on this data, not only surpasses strong open-source baselines but also achieves performance competitive with, and at times superior to, leading proprietary models like GPT-4o and Claude 3.5
Could be a game changer. Also and surely if that 8B in "DeepWriter-8B" indicates the NN size (edit: and it certainly should, since DeepWriter-8B is a fine-tuning of Qwen3-8b), and the results are comparable to much bigger models...
Any chance we will be able to try that DeepWriter?
--
Edit: the catch seems to be their «goal is to instill deep reasoning in LLMs for open-ended tasks». If you implement reasoning, the goal is to achieve results that are actually better, getting to "right answers", in problems that while complex are territory for more-optimal and less-optimal solutions.
The question is, does "REverse-Engineered Reasoning" also enhance solution-oriented reasoning, and thought-perfecting reasoning? This is what matters.
> Using this scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks. Our model, DeepWriter-8B, trained on this data, not only surpasses strong open-source baselines but also achieves performance competitive with, and at times superior to, leading proprietary models like GPT-4o and Claude 3.5
Could be a game changer. Also and surely if that 8B in "DeepWriter-8B" indicates the NN size (edit: and it certainly should, since DeepWriter-8B is a fine-tuning of Qwen3-8b), and the results are comparable to much bigger models...
Any chance we will be able to try that DeepWriter?
--
Edit: the catch seems to be their «goal is to instill deep reasoning in LLMs for open-ended tasks». If you implement reasoning, the goal is to achieve results that are actually better, getting to "right answers", in problems that while complex are territory for more-optimal and less-optimal solutions.
The question is, does "REverse-Engineered Reasoning" also enhance solution-oriented reasoning, and thought-perfecting reasoning? This is what matters.