Show HN: Entropy-Guided Loop – How to make small models reason
- Captures logprobs/top-k during generation, computes perplexity and token-level entropy.
- Triggers at most one refine when simple thresholds fire; passes a compact “uncertainty report” (uncertain tokens + top-k alts + local context) back to the model.
- In our tests on technical Q&A / math / code, a small model recovered much of “reasoning” quality at ~⅓ the cost while refining ~⅓ of outputs.
I kept seeing “reasoning” models behave like expensive black boxes. Meanwhile, standard inference already computes useful signals both before softmax normalization and after it(logprobs), which we usually throw away. This loop tries the simplest thing that you could think of: use those signals to decide when (and where) to think again.
GitHub (notebook + minimal code): https://github.com/monostate/weave-logprobs-reasoning-loop
Paper (short & engineer made): https://arxiv.org/abs/2509.00079
Blog (more context): https://monostate.ai/blog/entropy-refinement-blog
Requirements: Python, API that exposes logprobs (tested with OpenAI non reasoning 4.1). OPENAI_API_KEY and WEAVE for observability. Run the notebook; it prints metrics and shows which tokens triggered refinement.
- Python, simple loop (no retraining).
- Uses Responses API logprobs/top-k; metrics: perplexity, max token entropy, low-confidence counts.
- Weave for lightweight logging/observability (optional).
- Passing alternatives (not just “this looks uncertain”) prevents over-correction.
- A simple OR rule (ppl / max-entropy / low-confidence count) catches complementary failure modes.
- Numbers drift across vendors; keeping the method vendor-agnostic is better than chasing fragile pairings.
- Needs APIs that expose logprobs/top-k.
- Results are indicative—not a leaderboard; focus is on within-model gains (single-pass vs +loop).
- Thresholds might need light tuning per domain.
- One pass only; not a chain-of-thought replacement.
- Run it on your models and ideas (e.g., 4o-mini, v3, Llama variants with logprobs) and share logs in a PR for our README in GitHub if you'd like, PRs welcome - I’ll credit and link.
Overall let me know if you find making small models reason like this useful!
No comments yet