Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

Comments (3)

steinvakt2 · 7m ago

I had a 5090 some months ago but couldnt get flash attention to work. Does it now work natively? What about 5080?

ProofHouse · 6m ago

Damn awesome. This going to take me 3 reads and a week to digest

doctorpangloss · 17m ago

Hmm, but supposing the accelerated NVIDIA specific inference data types were available for Triton, then you would just use that? Why not contribute to Triton, they accept PRs? Like so what if you do free product ecosystem development for NVIDIA and giant corporations by contributing to Triton?

Invention as Exploration (symmetrybroken.com)

Elon Musk Simulating Microsoft with "MacroHard" (businessinsider.com)

How Do You Move a Hundred-Year-Old Church? On Wheels, Very Slowly. (nytimes.com)

Failure of a Soulless Technology – The GPT 5 Fiasco (nirhindie.substack.com)

Gluttony (extremq.com)

Student's "time travel" AI experiment accidentally outputs real 1834 history (arstechnica.com)

Nvidia Introduces Spectrum-XGS Ethernet to Connect Distributed Data Centers (nvidianews.nvidia.com)

Antarctica Is in Extreme Peril (grist.org)

Show HN: Decklab AI – Generate and prototype trading card games (decklab.ai)

An inner-speech decoder reveals some mental privacy issues (arstechnica.com)

What Does China Want? (watermark02.silverchair.com)

Show HN: Let the Docs Find You (github.com)

Physics of Racing [video] (youtube.com)

Show HN: I built a Mario that can backflip (twitter.com)

Behind the scenes of the process of creating a macOS app (old.reddit.com)

Startup founders, practice your VC pitch with AI investors before the real talk (pitchplease.live)

Are Saunas Good for You? (economist.com)

FTC Chair warns tech firms not to weaken data privacy to comply with EU, UK laws (reuters.com)

Bead Restructuring Policy Notice (broadbandusa.ntia.gov)

Cascata Delle Marmore (en.wikipedia.org)

Cubix – The first 3D platformer of its kind for the ZX Spectrum (indieretronews.com)

Waitgroups: What they are, how to use them and what changed with Go 1.25 (mfbmina.dev)

Experts explain your rights – as an immigrant or as a bystander (cnn.com)

Show HN: Scipion – Read research papers with zero overhead (scipion.dev)

Inspection of railway catenary systems using machine learning (nature.com)

Show HN: Agex – An agent framework that lives in your Python runtime (ashenfad.github.io)

Using game theory to explain how institutions arise to manage limited resources (phys.org)

Ne Zha II is the biggest movie in the world (cbc.ca)

Robots can now learn to use tools just by watching us (techxplore.com)

The Problem with Israel Is So Much Bigger Than Netanyahu (jacobin.com)

Afghan women turn to online courses as Taliban bans education (apnews.com)

Whatever Happened to the Self Driving Semi? (itcanthink.substack.com)

The JWST Rocky Worlds DDT Program reveals GJ 3929B to likely be a bare rock (arxiv.org)

The Comfortable Familiarity Dilemma (fedecarg.substack.com)

AMD VitisT Al Integrated Development Environment (github.com)

Software Engineer Required (1989) (jonatron.github.io)

What Hinders Electric Vehicle Diffusion? Insights from a Neural Network Approach (feem.it)

Banana Pi BPI-M4 Super with Rockchip RK3568B2 and Dual Ethernet (linuxgizmos.com)

Delphi (Object Pascal) in the Age of AI (learndelphi.org)

"To Catch a Predator" Chris Hansen Throws Spotlight on Roblox (forbes.com)

Chanoyu (en.wikipedia.org)

Raum Der Stille (Room of Silence) (raum-der-stille-im-brandenburger-tor.de)

Testing: Harness vs. Unit (littlegreenviper.com)

Dissonance: A journey through musical possibility space (aatishb.com)

Postgres Logging for Performance Optimization (crunchydata.com)

AWS account access problem – support not responding after 48h

HTML Partials and Server Reducers: An Alternative to React Spas (cimatic.io)

Make LLMs supportive, not sycophantic (chrisbarber.co)

Ask HN: Old Calcified libraries and APIs used in B2B apps

Show HN: Gen – All pages are AI-generated in real time (gen.soham.sh)

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

Comments (3)