Show HN: Entropy-Guided Loop – How to make small models reason

33 andrewmonostate 2 9/3/2025, 5:19:10 PM github.com ↗
TLDR: A small, vendor-agnostic inference loop that turns token logprobs/perplexity/entropy into an extra pass and reasoning for LLMs.

- Captures logprobs/top-k during generation, computes perplexity and token-level entropy.

- Triggers at most one refine when simple thresholds fire; passes a compact “uncertainty report” (uncertain tokens + top-k alts + local context) back to the model.

- In our tests on technical Q&A / math / code, a small model recovered much of “reasoning” quality at ~⅓ the cost while refining ~⅓ of outputs.

I kept seeing “reasoning” models behave like expensive black boxes. Meanwhile, standard inference already computes useful signals both before softmax normalization and after it(logprobs), which we usually throw away. This loop tries the simplest thing that you could think of: use those signals to decide when (and where) to think again.

GitHub (notebook + minimal code): https://github.com/monostate/weave-logprobs-reasoning-loop

Paper (short & engineer made): https://arxiv.org/abs/2509.00079

Blog (more context): https://monostate.ai/blog/entropy-refinement-blog

Requirements: Python, API that exposes logprobs (tested with OpenAI non reasoning 4.1). OPENAI_API_KEY and WEAVE for observability. Run the notebook; it prints metrics and shows which tokens triggered refinement.

- Python, simple loop (no retraining).

- Uses Responses API logprobs/top-k; metrics: perplexity, max token entropy, low-confidence counts.

- Weave for lightweight logging/observability (optional).

- Passing alternatives (not just “this looks uncertain”) prevents over-correction.

- A simple OR rule (ppl / max-entropy / low-confidence count) catches complementary failure modes.

- Numbers drift across vendors; keeping the method vendor-agnostic is better than chasing fragile pairings.

- Needs APIs that expose logprobs/top-k.

- Results are indicative—not a leaderboard; focus is on within-model gains (single-pass vs +loop).

- Thresholds might need light tuning per domain.

- One pass only; not a chain-of-thought replacement.

- Run it on your models and ideas (e.g., 4o-mini, v3, Llama variants with logprobs) and share logs in a PR for our README in GitHub if you'd like, PRs welcome - I’ll credit and link.

Overall let me know if you find making small models reason like this useful!

Comments (2)

mountainriver · 1d ago
Deep Entropix vibes
andrewmonostate · 14h ago
Thanks for bringing this up! Good catch on the similarities! Yes, both use entropy/uncertainty to allocate compute intelligently.

From what I understand, Entropix is an entropy-aware decoder - it monitors token entropy during generation and dynamically adjusts sampling or spawns parallel CoT branches at high-uncertainty points. It's a decoding-time intervention.

My approach doesn't touch decoding at all. I:

1. Generate normally (standard sampling)

2. Capture logprobs + top-k alternatives

3. Check if perplexity/entropy/confidence triggers exceed thresholds

4. If yes, do ONE refinement pass with an "uncertainty report" showing the model exactly which tokens were uncertain + their alternatives + context

The key difference: Entropix steers the ship while sailing; my loop reviews the voyage log and decides whether to make one correction pass. No branching, no custom samplers, deterministic cost (0 or 1 extra pass).

They're actually complementary - you could use Entropix entropy-aware sampling for initial generation and still apply a refinement loop afterward. Same underlying signal (entropy), different control points! The result of combining both should be outstanding! I will test it soon.