Do We See the Same Colors as Others? Brains Respond to Same Hues in Similar Ways (smithsonianmag.com)

> On August 25, we deployed a misconfiguration to the Claude API TPU servers that caused an error during token generation. An issue caused by a runtime performance optimization occasionally assigned a high probability to tokens that should rarely be produced given the context, for example producing Thai or Chinese characters in response to English prompts, or producing obvious syntax errors in code. A small subset of users that asked a question in English might have seen "สวัสดี" in the middle of the response, for example.

Can anyone explain to a layperson how this sort of thing is even possible for an LLM?

For normal code, of course stupid bugs happen all the time. You accidentally introduce an off-by-one error in a conditional, for example, or add an extra `goto fail`.

But LLMs aren't written by humans! Models are trained by automated programs over a period of many months across unfathomably massive data centers.

How would a human introduce a bug like the one described in TFA?

ashdksnndck · 2m ago

There are many layers of human-written code in between you and the weights.

extr · 12m ago

> Incorrect routing affected less than 0.0004% of requests on Google Cloud's Vertex AI between August 27 and September 16.

Matches my experience. I use CC through our enterprise Vertex AI account and never noticed any degradation.

In general it seems like these bugs, while serious, were substantially less prevalent than anecdotal online reports would have you believe. We are really talking about a ~1-2 week window here where most issues were concentrated, a relatively small percentage of total requests and total users impacted.

ispeaknumbers · 6m ago

I'm not sure if you can claim these were "less prevalent than anecdotal online reports". From their article:

> Approximately 30% of Claude Code users had at least one message routed to the wrong server type, resulting in degraded responses.

> However, some users were affected more severely, as our routing is "sticky". This meant that once a request was served by the incorrect server, subsequent follow-ups were likely to be served by the same incorrect server.

30% of Claude Code users getting a degraded response is a huge bug.

deepdarkforest · 14m ago

Wow. Sneaky. They do not even state the rate of impact for the XLA bug afaik, which affected everyone, not just claude code users, very vague. Interesting.

Claude code made almost half a billion so far[1] (>500m in ARR and its like 9 months old) , and 30% of all users have been impacted at least once, just from the first routing bug. Scary stuff.

Their post mortem is basically "evaluations are hard, we relied on vibe checking, now we are going to have even more frequent vibe checking". I believe it was indeed unintentional, but in the future where investor's money wont come down from the skies, serving distilled models will be very tempting. And you can not be liable to any SLA currently, it's just vibes. I wonder how enterprise vendors are going to deal with this going forward, you cannot just degrade quality without client or vendor even being able to really prove it.

[1][https://www.anthropic.com/news/anthropic-raises-series-f-at-...]

extr · 5m ago

Is your contention that paying for a service entitles you to zero bugs, ever?

stellalo · 25m ago

Title should be fixed: it’s about Claude models in general, not Claude Code

bravetraveler · 11m ago

> We don't typically share this level of technical detail about our infrastructure, but the scope and complexity of these issues justified a more comprehensive explanation.

Layered in aggrandizing. You host a service, people give you money.

OGEnthusiast · 10m ago

Seems like Claude is using TPUs a lot more than I thought. For some reason I thought 90%+ of their capacity was from AWS.

flutas · 3m ago

And yet no offers of credits to make things right for the users, for what was essentially degraded performance of what you paid for.

I know I'll probably get push back on this, but it left a sour taste in my mouth when I paid for a $200 sub that felt like it was less useful than ChatGPT Pro ($20) at times.

Or to summarize: [south park "we're sorry" gif]

moatmoat · 1h ago

TL;DR — Anthropic Postmortem of Three Recent Issues

In Aug–Sep 2025, Claude users saw degraded output quality due to infrastructure bugs, not intentional changes.

The Three Issues 1. *Context window routing error* - Short-context requests sometimes routed to long-context servers.

   - Started small, worsened after load-balancing changes.

2. *Output corruption* - TPU misconfigurations led to weird outputs (wrong language, syntax errors).

   - Runtime optimizations wrongly boosted improbable tokens.

3. *Approximate top-k miscompilation* - A compiler bug in TPU/XLA stack corrupted token probability selection.

   - Occasionally dropped the true top token.

Why It Was Hard to Detect - Bugs were subtle, intermittent, and platform-dependent.

- Benchmarks missed these degradations.

- Privacy/safety rules limited access to real user data for debugging.

Fixes and Next Steps - More sensitive, continuous evals on production.

- Better tools to debug user feedback safely.

- Stronger validation of routing, output correctness, and token-selection.

sebastiennight · 30m ago

> Privacy/safety rules limited access to real user data for debugging.

Do their ToS really limit access to user data (prompt/response)? I don't remember seeing anything to that effect in their terms.

mcintyre1994 · 26m ago

I’d imagine they have a lot of internal controls, even if ultimately someone at the company can read the data within their terms. It makes sense that the teams debugging stuff wouldn’t have this access immediately.

favorited · 16m ago

I know that when you submit a thumbs up/down rating for a response, you need to opt-in to the whole chat conversation being shared with Anthropic.

Pg_links (giulianopz.github.io)

Delfi Announces Interest Rate Risk Management Platform (morningstar.com)

New Python CLI Tool Catches MCP Server Issues Before Agents Do (thenewstack.io)

SpikingBrain Technical Spiking Brain-Inspired Large Models (arxiv.org)

Fluctuations of viti- and oleiculture traditions in Bronze and Iron Age Levant (journals.plos.org)

Good starting point for professional AI workstation? (llamabuilds.ai)

Wero: Move Money in Real Time. For Real (wero-wallet.eu)

Modal Notebooks: How we built a cloud GPU notebook that boots in seconds (modal.com)

2025's AI Makeover Is Letting Small Businesses Compete (clientlever.com)

Opening the Door to Lifelong Curiosity (templeton.org)

LLMs Are Non-Deterministic (datamethods.substack.com)

Are Rolex Watches Cheaper at an Airport? (watchmydiamonds.com)

Hololuminescent Display (lookingglassfactory.com)

Should you use Tahoe's new ASIF disk images? (eclecticlight.co)

Stephen Miller's Quota Likely Drove Korean Arrests in Immigration Raid (forbes.com)

Tuttobanana – Send joyful digital gifts (tuttobanana.com)

Reactant.jl – Julia execution code on GPUs (github.com)

Official MCP Registry Client (github.com)

"How Can I Write at a Time Like This?" (querent.substack.com)

HPE ProLiant MicroServer Gen11 Review Great New Mini Server (servethehome.com)

Programming language inventor or serial killer? (2003) (vole.wtf)

Fired for exercising free speech: irony (nzherald.co.nz)

Andrew Yang's Noble Mobile, Launches with $10M in Seed Funding (finance.yahoo.com)

Ask HN: What Terminal apps (via homebrew) support 24 bit color on macOS Tahoe?

Bessent made mortgage claims similar to ones Trump cited to fire Fed's Cook (cnbc.com)

New blood micro-sampling method may enable early prevention of type 1 diabetes (medicalxpress.com)

Tesla is trying to hide 3 Robotaxi accidents (electrek.co)

Do We See the Same Colors as Others? Brains Respond to Same Hues in Similar Ways (smithsonianmag.com)

Use an RSS Reader (parsam.io)

Google Stich Holy Crap (stitch.withgoogle.com)

Tesla settles two lawsuits on 2019 California crashes related to Autopilot (reuters.com)

Addressing security and quality issues with MCP tools in AI Agents (vercel.com)

Bringing concepts to life – Framestore's Visual Development [video] (youtube.com)

OpenAI incorporated AI dev tools into ICPC World Finals (worldfinals.icpc.global)

Everactive's Self-Powered SoC at Hot Chips 2025 (chipsandcheese.com)

A $5k Lean Startup Experiment Became Michigan's Innovation Hub (jointhequarter.com)

Jqp: TUI Playground to Experiment with Jq (github.com)

Show HN: SmoothTalk – Practice Cold Approaching and Flirting with Voice AI (smoothtalk.app)

Milwaukee M18 tool battery communication protocol (github.com)

Tesla Is Redesigning Door Handles That Drew Safety Scrutiny (bloomberg.com)

Upgrading by Downgrading (grizzlygazette.bearblog.dev)

Show HN: Veors – A platform showcasing AI videos created by people (veors.com)

Tell HN: EasyList break more than just YouTube

I'd Like to Believe (1999) (condenaststore.com)

Praxos: A personal AI companion. Works with WhatsApp and Telegram text and voice (mypraxos.com)

Show HN: Keplar – Voice AI for qualitative research at quantitative scale (keplar.io)

Ask HN: What is the most useful AI tool you use outside of Cursor/ChatGPT?

Show HN: A new platform similar to codepen.io (crazycontext.com)

Firefox 143 Released (firefox.com)

Attack on SonicWall's cloud portal exposes customers' firewall configurations (cyberscoop.com)

A postmortem of three recent issues

Comments (14)