More Layers Unlock 2^N Transformer Context Depth with Divide and Conquer
5 michael_lutz 3 7/12/2025, 3:30:18 PM ml-mike.com ↗
Comments (3)
michael_lutz · 5h ago
Context windows are now 1M+ tokens, but context depth is limited. Often, the answer is hidden behind layers of linked information, but an attention block can only resolve one link at a time. We trained a tiny 5 layer model that beats GPT-4.5 on a variable evaluation task requiring deep, recursive reasoning. How? It learned a divide and conquer mechanism.
ghostgoober · 5h ago
Nice. Does the give general improvements on models (other benchmarks etc) or is it very specific to narrow domains.
michael_lutz · 4h ago
That's a really interesting question, and it's one I'd love to answer in a future work. This blog mostly focuses on characterizing context depth limits.