Log-Linear Attention

14 sva_ 3 6/7/2025, 4:01:20 PM arxiv.org ↗

Comments (3)

btilly · 27m ago
I think it would be very good if they can make this work. I suspect that we do something not entirely unlike this, and that is why spaced repetition is so good for stuffing things into our long term memories.
iknownothow · 1h ago
> Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states

Does this mean the models can be smaller too (on top of the primary benefit of being faster)?

Lerc · 29m ago
Reduced memory consumption for context perhaps, but hidden state is different from weights. I don't think this would improve the model's capability per model parameter (but as with everything with ML, I wouldn't bet against it until it's been tested)