DeepSeek won the best paper award at ACL 2025

30 CalmStorm 3 8/1/2025, 7:48:06 PM arxiv.org ↗

Comments (3)

sabakhoj · 14m ago
> Despite being sparse, NSA surpasses Full Attention baseline on average across general benchmarks, long-context tasks, and reasoning evaluation.

Isn't it very notable that the latency improvement didn't have a performance loss? I'm not super familiar with all the technical aspects, but that seems like it should be one of the main focuses of the paper.

gnabgib · 9m ago
Title: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

The awards page for ACL seems to disagree with this editorialized title: https://2025.aclweb.org/program/awards/

CalmStorm · 4h ago
For the first time, it introduced native sparse attention into the full training process, achieving up to 11× inference speedup while maintaining model performance.