Context Rot: How increasing input tokens impacts LLM performance

38 kellyhongsn 8 7/14/2025, 7:25:15 PM research.trychroma.com ↗
I work on research at Chroma, and I just published our latest technical report on context rot.

TLDR: Model performance is non-uniform across context lengths, including state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models.

This highlights the need for context engineering. Whether relevant information is present in a model’s context is not all that matters; what matters more is how that information is presented.

Here is the complete open-source codebase to replicate our results: https://github.com/chroma-core/context-rot

Comments (8)

posnet · 51m ago
I've definitely noticed this anecdotally.

Especially with Gemini Pro when providing long form textual references, providing many documents in a single context windows gives worse answers than having it summarize documents first, ask a question about the summary only, then provide the full text of the sub-documents on request (rag style or just simple agent loop).

Similarly I've personally noticed that Claude Code with Opus or Sonnet gets worse the more compactions happen, it's unclear to me whether it's just the summary gets worse, or if its the context window having a higher percentage of less relevant data, but even clearing the context and asking it to re-read the relevant files (even if they were mentioned and summarized in the compaction) gives better results.

zwaps · 32m ago
Gemini loses coherence and reasoning ability well before the chat hits the context limitations, and according to this report, it is the best model on several dimensions.

Long story short: Context engineering is still king, RAG is not dead

risyachka · 5m ago
Yep. The easiest way to tell someone has no experience with LLMs is if they say “RAG is dead”
tough · 33m ago
Have you tried NotebookLM which basically does this as an app on the bg (chunking and summarising many docs) and you can -chat- with the full corpus using RAG
zwaps · 33m ago
Very cool results, very comprehensive article, many insights!

Media literacy disclaimer: Chroma is a vectorDB company.

philip1209 · 17m ago
Chroma does vector, full-text, and regex search. And, it's designed for multitenant workloads typical of AI applications. So, not just a "vectorDB company"
tough · 33m ago
this felt intuitively true, great to see some research putting hard numbers on that
tjkrusinski · 1h ago
Interesting report. Are there recommended sizes for different models? How do I know what works or doesn't for my use case?