Native Sparse Attention
92 CalmStorm 10 8/1/2025, 7:48:06 PM aclanthology.org ↗
Was submitted as "DeepSeek won the best paper award at ACL 2025"
Here is the awards page: https://cspaper.org/topic/116/record-breaking-acl-2025-crown...
I have a suspicion with how quiet all the major players got after the two weeks after deepseek R1 was released that they were reading and implementing everything in the papers that came with it as fast as humanly possible.
I applaud their open efforts. But being "altruistic" and being best are two different things.
Isn't it very notable that the latency improvement didn't have a performance loss? I'm not super familiar with all the technical aspects, but that seems like it should be one of the main focuses of the paper.
The awards page for ACL seems to disagree with this editorialized title: https://2025.aclweb.org/program/awards/
> Industry Track Awards
> Best Paper
> Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications
> Daniel Zagyva, Emmanouil Stergiadis, Laurens van der Maas, Aleksandra Dokic, Eran Fainman, Ilya Gusev, Moran Beladev
Per TFA, the paper we’re looking for is this one:
> Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
> Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng
I’m not finding it by author on the page you linked but I think it’s this reference by title:
> DeepSeek × PKU × UW — Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
I did find it on this page:
https://2025.aclweb.org/program/main_papers/
https://aclanthology.org/2025.acl-long.1126