After nine months of chasing weird hallucinations and silent failures in production LLM / RAG systems, we catalogued every failure pattern we could reproduce. The result is an MIT-licensed “Semantic Clinic” with 16 root-cause families and step-by-step fixes.
---
## Why we built it
@ Most bug reports just say “the model lied,” but the cause is almost always deeper: retrieval drift, OCR mangling, prompt contamination, etc.
@ Existing docs mix symptoms and remedies in random blogposts; we wanted one map that shows where the pipeline breaks and why.
@ After fixing the same issues across 11 real stacks we decided to standardise the notes and open-source them.
@ Quick triage index: find the symptom → jump to the fix page.
@ Each page gives: real-world symptoms, metrics to watch (ΔS semantic tension, λ_observe logic flow), a reproducible notebook, and a “band-aid-to-surgery” list of fixes.
@ Tiny CLI tools: semantic diff viewer, prompt isolator, vector compression checker. All plain bash + markdown so anyone can fork.
---
## Does it help?
@ On our own stacks the average debug session dropped from hours to ~15 min once we tagged the family.
@ The first 4 root causes explain ~80 % of the bugs we see in the wild.
@ Used so far on finance chatbots, doc-QA, multi-agent sims; happy to share war stories.
## Call for help
@ If you’ve hit a failure that isn’t on the list, open an issue or PR. We especially want examples of symbolic prompt contamination or large-scale entropy collapse.
@ Long-term goal: turn the clinic into a self-serve triage bot that annotates stack traces automatically.
---
## Why open-source?
Debug knowledge shouldn’t be pay-walled. The faster we share failure modes, the faster the whole field moves (and the fewer 3 a.m. rollbacks we all do).
After nine months of chasing weird hallucinations and silent failures in production LLM / RAG systems, we catalogued every failure pattern we could reproduce. The result is an MIT-licensed “Semantic Clinic” with 16 root-cause families and step-by-step fixes.
---
## Why we built it
@ Most bug reports just say “the model lied,” but the cause is almost always deeper: retrieval drift, OCR mangling, prompt contamination, etc.
@ Existing docs mix symptoms and remedies in random blogposts; we wanted one map that shows where the pipeline breaks and why.
@ After fixing the same issues across 11 real stacks we decided to standardise the notes and open-source them.
---
## What’s inside
@ 16 root-cause pages (Hallucination & Chunk Drift, Interpretation Collapse, Entropy Melts, etc.).
@ Quick triage index: find the symptom → jump to the fix page.
@ Each page gives: real-world symptoms, metrics to watch (ΔS semantic tension, λ_observe logic flow), a reproducible notebook, and a “band-aid-to-surgery” list of fixes.
@ Tiny CLI tools: semantic diff viewer, prompt isolator, vector compression checker. All plain bash + markdown so anyone can fork.
---
## Does it help?
@ On our own stacks the average debug session dropped from hours to ~15 min once we tagged the family.
@ The first 4 root causes explain ~80 % of the bugs we see in the wild.
@ Used so far on finance chatbots, doc-QA, multi-agent sims; happy to share war stories.
## Call for help
@ If you’ve hit a failure that isn’t on the list, open an issue or PR. We especially want examples of symbolic prompt contamination or large-scale entropy collapse. @ Long-term goal: turn the clinic into a self-serve triage bot that annotates stack traces automatically.
---
## Why open-source?
Debug knowledge shouldn’t be pay-walled. The faster we share failure modes, the faster the whole field moves (and the fewer 3 a.m. rollbacks we all do).
Cheers – PSBigBig / WFGY team