Writing Speed-of-Light Flash Attention for 5090 in CUDA C++
48 dsr12 3 8/23/2025, 12:29:02 PM gau-nernst.github.io ↗
Comments (3)
steinvakt2 · 7m ago
I had a 5090 some months ago but couldnt get flash attention to work. Does it now work natively? What about 5080?
ProofHouse · 6m ago
Damn awesome. This going to take me 3 reads and a week to digest
doctorpangloss · 17m ago
Hmm, but supposing the accelerated NVIDIA specific inference data types were available for Triton, then you would just use that? Why not contribute to Triton, they accept PRs? Like so what if you do free product ecosystem development for NVIDIA and giant corporations by contributing to Triton?