Reduce cache memory on avg 39x

2 jonathanehrlich 1 6/9/2025, 5:40:47 PM twitter.com ↗

Comments (1)

jonathanehrlich · 3h ago
What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x (enabling 26x higher tok/s and lower TTFT) while maintaining quality. These smaller KV caches, which we call cartridges, can be trained once and reused for different user requests!