Implementing a Fast Tensor Core Matmul on the Ada Architecture

2 skidrow 1 7/18/2025, 8:26:22 AM spatters.ca ↗

Comments (1)

jhlee525 · 8h ago

This is incredibly useful. Thanks for making the kernels public.

I'm curious if anyone has tried generalizing this to batched matmuls or to sparse inputs on Ada?