We open-sourced a 'Semantic Clinic' for LLM bugs

Comments (1)

TXTOS · 3h ago

TL;DR

After nine months of chasing weird hallucinations and silent failures in production LLM / RAG systems, we catalogued every failure pattern we could reproduce. The result is an MIT-licensed “Semantic Clinic” with 16 root-cause families and step-by-step fixes.

---

## Why we built it

@ Most bug reports just say “the model lied,” but the cause is almost always deeper: retrieval drift, OCR mangling, prompt contamination, etc.

@ Existing docs mix symptoms and remedies in random blogposts; we wanted one map that shows where the pipeline breaks and why.

@ After fixing the same issues across 11 real stacks we decided to standardise the notes and open-source them.

---

## What’s inside

@ 16 root-cause pages (Hallucination & Chunk Drift, Interpretation Collapse, Entropy Melts, etc.).

@ Quick triage index: find the symptom → jump to the fix page.

@ Each page gives: real-world symptoms, metrics to watch (ΔS semantic tension, λ_observe logic flow), a reproducible notebook, and a “band-aid-to-surgery” list of fixes.

@ Tiny CLI tools: semantic diff viewer, prompt isolator, vector compression checker. All plain bash + markdown so anyone can fork.

---

## Does it help?

@ On our own stacks the average debug session dropped from hours to ~15 min once we tagged the family.

@ The first 4 root causes explain ~80 % of the bugs we see in the wild.

@ Used so far on finance chatbots, doc-QA, multi-agent sims; happy to share war stories.

## Call for help

@ If you’ve hit a failure that isn’t on the list, open an issue or PR. We especially want examples of symbolic prompt contamination or large-scale entropy collapse. @ Long-term goal: turn the clinic into a self-serve triage bot that annotates stack traces automatically.

---

## Why open-source?

Debug knowledge shouldn’t be pay-walled. The faster we share failure modes, the faster the whole field moves (and the fewer 3 a.m. rollbacks we all do).

Cheers – PSBigBig / WFGY team

Legendary GPU architect Raja Koduri's startup leverages RISC-V and targets CUDA (tomshardware.com)

Retab: The developer starter pack for document processing (retab.com)

Ditching GitHub (tomscii.sig7.se)

Is Universal Basic Income Effective? Not Really (city-journal.org)

PyPI: Preventing ZIP parser confusion attacks on Python package installers (blog.pypi.org)

Live: GPT-5 (youtube.com)

The Sunlight Budget of Earth (asimov.press)

Show HN: FocusTree – a simple task app (prototype), free open source (github.com)

GPT-5 Coding Examples (github.com)

I built a site to surface mind-blowing, underrated websites (offscopes.com)

ZFSBootMenu (zfsbootmenu.org)

1h The NDA for the Framework Desktop reviews has been lifted (community.frame.work)

Think Linux desktop market share isn't over 6%? This scan says otherwise (zdnet.com)

Is your brain necessary for consciousness? (iai.tv)

US to levy 100% tariff on imported chips, but some firms exempt (reuters.com)

EyJaafCsubstantially: Cramming English words into JSON web tokens (tesseral.com)

Go-2025-3849: Incorrect results returned from Rows.Scan in database/SQL (pkg.go.dev)

TimescaleDB 2.21 – 42× Faster DELETEs (tigerdata.com)

Password Pusher: Share secrets securely with self-deleting links and audit logs (docs.pwpush.com)

Patch now: Dell PCs with Broadcom chips vulnerable to attack (theregister.com)

Google Confirms It Has Been Hacked – Warns User Data Stolen (forbes.com)

Address Formats Around the World (w3c.github.io)

Seeing the Bad Helps You Spot the Good (newsletter.eng-leadership.com)

Freezing rent is easy. Making NYC housing affordable isn't (japantimes.co.jp)

Show HN: Browsernode – Open-source TS browser agent(browser-use compatible) (github.com)

OpenJBOD (github.com)

Ask HN: Any advice to get my first job as junior fullstack dev?

AI-powered news aggregation platform (northcodic.blogspot.com)

PyModeS: Python decoder for Mode S and ADS-B signals (github.com)

Lyten to Acquire Northvolt (lyten.com)

The Hardest Conversations (substack.com)

Optimizing My Disk Usage Program (healeycodes.com)

Make Your Agent Listen: Tactics for Obedience (blog.pamba.app)

UK interest rates cut to lowest level in more than two years (bbc.co.uk)

Building Bluesky Comments for My Blog (natalie.sh)

P2: Functional Templating for Ruby (github.com)

ChatGPT Is a Blurry JPEG of the Web (newyorker.com)

Small Models, Big Wins: Agentic AI in Enterprise Explained (blog.premai.io)

Researcher visa curbs threaten science careers and countries leading the charge (nature.com)

Japan Law Will Require Apple to Allow Non-WebKit Browsers on iPhone (macrumors.com)

All 21 Daniel Day-Lewis films – ranked (theguardian.com)

Web Performance (developer.mozilla.org)

Show HN: ApplicationFeedback – AI critiques top-tier accelerator applications (applicationfeedback.com)

Show HN: A news agent to easily follow anything you care about (testflight.apple.com)

Dear String-to-Integer Parsers (owl.billpg.com)

A Bilingual Benchmark Dataset and Evaluation Framework (thedailypimp.blogspot.com)

Taxicab Geometry: Welcome to a city where pi equals 4 and circles aren't round (nytimes.com)

Improving 9-1-1 Operations with Artificial Intelligence (ntia.gov)

Using Google Opal to solve real-world problems

Developers: Stop Job Hunting like it's 2015 (thoughtfuleng.substack.com)

We open-sourced a 'Semantic Clinic' for LLM bugs – 16 root causes

Comments (1)