Atlas: Learning to Optimally Memorize the Context at Test Time
36 og_kalu 2 5/31/2025, 2:13:00 PM arxiv.org ↗
Comments (2)
cgearhart · 13h ago
It seems like there’s been a lot of progress here, but it also seems like there’s an elephant in the room that RNNs will _always_ have worse memory than self-attention since the latter always has complete access to the full context. We pay for that in other ways, but it seems like the unstated hypothesis of RNNs is that we believe in the long run that RNNs will be “good enough” and their other performance benefits will eventually prevail. I’m not convinced that humanity will ever sink comparable resources into optimizing this family of models that has gone into Transformers to make them practical at the scale they run today.
arjvik · 15h ago
The Titans papers and the Test-Time Training papers (https://arxiv.org/abs/2407.04620) both have the same premise - models should "learn" from their context rather than memorize them. Very promising direction!