These titles need to stop, we've seen that in fact it is not all you need.
seeknotfind · 15m ago
All you need titles stopping is all you need.
EGreg · 1m ago
We need more than that, and all you need to stop saying that!!
tankenmate · 40m ago
The title of this paper is a reference to a previous paper titled "Attention Is All You Need"[0][1]. This seminal work described the transformer model that is the basis for almost all LLMs, and is almost certainly the most cited paper on AI even though it was only published in 2017.
Still wrapping my head around this architecture, but the idea of reducing headcount while maintaining performance is compelling. Would love to see a benchmark against something like FlashAttention.
olq_plo · 3h ago
Very cool idea. Can't wait for converted models on HF.
kavalg · 2h ago
My (possibly wrong) TLDR: TransMLA is a method to "compress" an already trained GQA model, with the additional option to further fine tune it. Shall make inference faster.
yorwba · 2h ago
It is not a method to compress a Grouped-Query Attention model, but to expand it into an equivalent Multi-head Latent Attention model with the same key-value cache size but larger effective key/value vectors and a correspondingly larger number of trainable parameters. With additional training, you can then obtain a better model that only uses a little bit more memory.
[0] https://arxiv.org/abs/1706.03762 [1] https://en.wikipedia.org/wiki/Attention_Is_All_You_Need
People seem to love going to the references graveyard, digging up tired and dead ones and drag them around town hoping everyone thinks they're clever.
Also this was from 3 months ago.
Answering my own question: https://www.reddit.com/r/MachineLearning/comments/1hpg91o/d_...