Show HN: Entropy-Guided Loop – How to make small models reason

20 andrewmonostate 0 9/3/2025, 5:19:10 PM github.com ↗

TLDR: A small, vendor-agnostic inference loop that turns token logprobs/perplexity/entropy into an extra pass and reasoning for LLMs.

- Captures logprobs/top-k during generation, computes perplexity and token-level entropy.

- Triggers at most one refine when simple thresholds fire; passes a compact “uncertainty report” (uncertain tokens + top-k alts + local context) back to the model.

- In our tests on technical Q&A / math / code, a small model recovered much of “reasoning” quality at ~⅓ the cost while refining ~⅓ of outputs.

I kept seeing “reasoning” models behave like expensive black boxes. Meanwhile, standard inference already computes useful signals both before softmax normalization and after it(logprobs), which we usually throw away. This loop tries the simplest thing that you could think of: use those signals to decide when (and where) to think again.

GitHub (notebook + minimal code): https://github.com/monostate/weave-logprobs-reasoning-loop

Paper (short & engineer made): https://arxiv.org/abs/2509.00079

Blog (more context): https://monostate.ai/blog/entropy-refinement-blog

Requirements: Python, API that exposes logprobs (tested with OpenAI non reasoning 4.1). OPENAI_API_KEY and WEAVE for observability. Run the notebook; it prints metrics and shows which tokens triggered refinement.

- Python, simple loop (no retraining).

- Uses Responses API logprobs/top-k; metrics: perplexity, max token entropy, low-confidence counts.

- Weave for lightweight logging/observability (optional).

- Passing alternatives (not just “this looks uncertain”) prevents over-correction.

- A simple OR rule (ppl / max-entropy / low-confidence count) catches complementary failure modes.

- Numbers drift across vendors; keeping the method vendor-agnostic is better than chasing fragile pairings.

- Needs APIs that expose logprobs/top-k.

- Results are indicative—not a leaderboard; focus is on within-model gains (single-pass vs +loop).

- Thresholds might need light tuning per domain.

- One pass only; not a chain-of-thought replacement.

- Run it on your models and ideas (e.g., 4o-mini, v3, Llama variants with logprobs) and share logs in a PR for our README in GitHub if you'd like, PRs welcome - I’ll credit and link.

Overall let me know if you find making small models reason like this useful!

Show HN: Entropy-Guided Loop – How to make small models reason (github.com)

Show HN: Trending rust NTP inspection CLI (github.com)

Show HN: Run gpt-oss-20b on 8GB GPUs (github.com)

Show HN: Chibi, AI that tells you why users churn (chibi.sh)

Show HN: Best JSON Comparison Tool (jsontoolbox.com)

Show HN: Helpme, a CLI tool to look up emergency and non emergency resources (github.com)

Show HN: LightCycle, a FOSS game in Rust based on Tron (github.com)

Show HN: Amber – better Beeper, a modern all-in-one messenger (useamber.app)

Show HN: Moribito – A TUI for LDAP Viewing/Queries (github.com)

Show HN: A vibe port of security library libinjection from C to Rust (github.com)

Show HN: VoiceGecko – System-wide voice-to-text that types anywhere (voicegecko.io)

Show HN: We built an open-source alternative to expensive pair programming apps (github.com)

Show HN: Text2SQL with a Graph Semantic Layer (github.com)

Show HN: My first Go project, a useless animated bunny sign for your terminal (github.com)

Show HN: dvcdbg 0.3.0: 1.3KB Initialization Sequence Explorer(Arduino in Rust) (crates.io)

Show HN: Davia – A community platform to build, share, and edit applications (docs.davia.ai)

Show HN: A hacky app for location sharing without suirvellance (fyrspot.app)

Show HN: Multi-Agent-Coder Is #12 on Stanford's TBench. Beats Claude Code (github.com)

Show HN: I built an AI that uses a metacognitive loop 2 solve invention problems (robw1se.substack.com)

Show HN: Simple modenized .NET NuGet server reached RC (github.com)

Show HN: Tail Lens – Visually edit tailwind css dev tool

Show HN: I built a deep research tool for local file system (github.com)

Show HN: Spotilyrics – See synchronized Spotify lyrics inside VS Code (github.com)

Show HN: An ncurses CUDA-based fluid simulation (github.com)

Show HN: Woomarks, transfer your Pocket links to this app or self-host it (woomarks.com)

Show HN: Fst – Lightweight C utility for detailed directory statistics LGPL 3.0

Show HN: slack-explorer-mcp – Let AI find historical context in Slack (github.com)

Show HN: Hacker News em dash user leaderboard pre-ChatGPT (gally.net)

Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown (sosumi.ai)

Show HN: Anonymous Age Verification (gist.github.com)

Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts (bilawal.net)

Show HN: Provably secure vibe coding is now a thing (secureaf.lovable.app)

Show HN: Lightweight server-driven template language for JavaScript (github.com)

Show HN: Unity WebGL Playground (onejs.com)

Show HN: Open-source AI writing your javadoc (deviantabstraction.com)

Show HN: Blueprint: Fast, Nunjucks-like templating engine for Java 8 and beyond

Show HN: Whodunit – Solve AI written mysteries (whodunit.rip)

Show HN: Promptproof – GitHub Action to test LLM prompts, catch bad JSON schemas (github.com)

Show HN: MCP Secrets Vault – Local MCP proxy to keep API keys out of LLM context (github.com)

Show HN: Find Hidden Gems on HN (pj4533.com)

Show HN: I made an Animal Crossing style letter editor (acmail.idreesinc.com)

Show HN: Neuron – Cognitive Multi-Agent Architecture for Reasoning

Show HN: PasteVault – An open-source, E2EE pastebin with a VS Code-like editor (pastevault.dev)

Show HN: Meetup.com and eventribe alternative to small groups (github.com)

Show HN: StoryMotion, hand-drawn motion graphics editor based on Excalidraw (storymotion.video)

Show HN: Forward Error Correction for Pion WebRTC (pion.ly)

Show HN: Ruby-TI mruby type analyser (github.com)

Show HN: Zyg – Stop Writing Status Updates (zyg.sh)

Show HN: A founder community with true anonymity(HMAC identities,no socialgraph) (foundermood.zorentia.com)

Show HN: I integrated my from-scratch TCP/IP stack into the xv6-riscv OS (github.com)

Show HN: Entropy-Guided Loop – How to make small models reason

Comments (0)