Cartridges: Storing long contexts in tiny caches with self-study

2 dvrp 1 7/2/2025, 5:19:48 AM github.com ↗

Comments (1)

dvrp · 20h ago
From their repo:

tl;dr When we put lots of text (e.g. a whole code repo) into a language model's context, generation cost soars because of the KV cache's size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe called self-study, we show that this simple idea can improve throughput by 26× while maintaining quality.

Link to their blog post: https://hazyresearch.stanford.edu/blog/2025-06-08-cartridges