“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

- Triggers at most one refine when simple thresholds fire; passes a compact “uncertainty report” (uncertain tokens + top-k alts + local context) back to the model.

- In our tests on technical Q&A / math / code, a small model recovered much of “reasoning” quality at ~⅓ the cost while refining ~⅓ of outputs.

I kept seeing “reasoning” models behave like expensive black boxes. Meanwhile, standard inference already computes useful signals both before softmax normalization and after it(logprobs), which we usually throw away. This loop tries the simplest thing that you could think of: use those signals to decide when (and where) to think again.

GitHub (notebook + minimal code): https://github.com/monostate/weave-logprobs-reasoning-loop

Paper (short & engineer made): https://arxiv.org/abs/2509.00079

Blog (more context): https://monostate.ai/blog/entropy-refinement-blog

Requirements: Python, API that exposes logprobs (tested with OpenAI non reasoning 4.1). OPENAI_API_KEY and WEAVE for observability. Run the notebook; it prints metrics and shows which tokens triggered refinement.

- Python, simple loop (no retraining).

- Uses Responses API logprobs/top-k; metrics: perplexity, max token entropy, low-confidence counts.

- Weave for lightweight logging/observability (optional).

- Passing alternatives (not just “this looks uncertain”) prevents over-correction.

- A simple OR rule (ppl / max-entropy / low-confidence count) catches complementary failure modes.

- Numbers drift across vendors; keeping the method vendor-agnostic is better than chasing fragile pairings.

- Needs APIs that expose logprobs/top-k.

- Results are indicative—not a leaderboard; focus is on within-model gains (single-pass vs +loop).

- Thresholds might need light tuning per domain.

- One pass only; not a chain-of-thought replacement.

- Run it on your models and ideas (e.g., 4o-mini, v3, Llama variants with logprobs) and share logs in a PR for our README in GitHub if you'd like, PRs welcome - I’ll credit and link.

Overall let me know if you find making small models reason like this useful!

Comments (2)

mountainriver · 1d ago

Deep Entropix vibes

andrewmonostate · 14h ago

Thanks for bringing this up! Good catch on the similarities! Yes, both use entropy/uncertainty to allocate compute intelligently.

From what I understand, Entropix is an entropy-aware decoder - it monitors token entropy during generation and dynamically adjusts sampling or spawns parallel CoT branches at high-uncertainty points. It's a decoding-time intervention.

My approach doesn't touch decoding at all. I:

1. Generate normally (standard sampling)

2. Capture logprobs + top-k alternatives

3. Check if perplexity/entropy/confidence triggers exceed thresholds

4. If yes, do ONE refinement pass with an "uncertainty report" showing the model exactly which tokens were uncertain + their alternatives + context

The key difference: Entropix steers the ship while sailing; my loop reviews the voyage log and decides whether to make one correction pass. No branching, no custom samplers, deterministic cost (0 or 1 extra pass).

They're actually complementary - you could use Entropix entropy-aware sampling for initial generation and still apply a refinement loop afterward. Same underlying signal (entropy), different control points! The result of combining both should be outstanding! I will test it soon.

We should have the ability to run any code we want on hardware we own (hugotunius.se)

Cognitive load is what matters (github.com)

Ask HN: The government of my country blocked VPN access. What should I use?

Do the simplest thing that could possibly work (seangoedecke.com)

Next.js is infuriating (blog.meca.sh)

30 minutes with a stranger (pudding.cool)

Making Minecraft Spherical (bowerbyte.com)

Google can keep its Chrome browser but will be barred from exclusive contracts (cnbc.com)

“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

Updates to Consumer Terms and Privacy Policy (anthropic.com)

Google AI Overview made up an elaborate story about me (bsky.app)

Where's the shovelware? Why AI coding claims don't add up (mikelovesrobots.substack.com)

Tesla said it didn't have key data in a fatal crash, then a hacker found it (washingtonpost.com)

Claude Code: Now in Beta in Zed (zed.dev)

Eternal Struggle (yoavg.github.io)

Almost anything you give sustained attention to will begin to loop on itself (henrikkarlsson.xyz)

Stripe Launches L1 Blockchain: Tempo (tempo.xyz)

Notes on Managing ADHD (borretti.me)

Anthropic raises $13B Series F (anthropic.com)

MIT Study Finds AI Use Reprograms the Brain, Leading to Cognitive Decline (publichealthpolicyjournal.com)

Google: 'Your $1000 phone needs our permission to install apps now' [video] (youtube.com)

Bear is now source-available (herman.bearblog.dev)

Some users have noticed settings that let Meta analyze and retain phone photos (zdnet.com)

A staff engineer's journey with Claude Code (sanity.io)

We already live in social credit, we just don't call it that (thenexus.media)

John Carmack's arguments against building a custom XR OS at Meta (twitter.com)

Are OpenAI and Anthropic losing money on inference? (martinalderson.com)

Grok Code Fast 1 (x.ai)

Implementing a Foil Sticker Effect (4rknova.com)

An LLM is a lossy encyclopedia (simonwillison.net)

Magic Lantern Is Back (magiclantern.fm)

Claude Sonnet will ship in Xcode (developer.apple.com)

Are we decentralized yet? (arewedecentralizedyet.online)

The staff ate it later (en.wikipedia.org)

The Little Book of Linear Algebra (github.com)

Six months into tariffs, businesses have no idea how to price anything (wsj.com)

The web does not need gatekeepers: Cloudflare’s new “signed agents” pitch (positiveblue.substack.com)

The Synology End Game (lowendbox.com)

Aspects of modern HTML/CSS you may not be familiar with (lyra.horse)

Uncertain<T> (nshipster.com)

AI adoption linked to 13% decline in jobs for young U.S. workers: study (cnbc.com)

If you have a Claude account, they're going to train on your data moving forward (old.reddit.com)

Atlassian is acquiring The Browser Company (cnbc.com)

VibeVoice: A Frontier Open-Source Text-to-Speech Model (microsoft.github.io)

Jujutsu for everyone (jj-for-everyone.github.io)

%CPU utilization is a lie (brendanlong.com)

Patrick Winston: How to Speak (2018) [video] (youtube.com)

Some thoughts on LLMs and software development (martinfowler.com)

FreeDroidWarn (github.com)

Twitter Shadow Bans Turkish Presidential Candidate (utkusen.substack.com)

Show HN: Entropy-Guided Loop – How to make small models reason

Comments (2)