Multiplatform Matrix Multiplication Kernels

38 homarp 7 7/18/2025, 7:59:49 PM burn.dev ↗

Comments (7)

airstrike · 4m ago
[delayed]
nathanielsimard · 54m ago
One of the author here, don't hesitate if you have any question or comment!
raphaelty · 1h ago
Very interesting, willing to try burn
almostgotcaught · 34m ago
I'm sorry this is a low brow comment but this is the dumbest thing you can do in this space:

> Unit (thread in CUDA, invocation in Vulkan/Wgpu): the smallest execution entity performing computations.

> Plane (warp in CUDA, subgroup in Vulkan/Wgpu): a group of (typically 32) units executing in lockstep and able to share data efficiently through registers.

> Cube (thread block in CUDA, workgroup in Vulkan/Wgpu): a group of units that execute on the same SM, sharing memory and able to synchronize

It's already bad enough that the vendors themselves insisted on different names but why in the bejesus would you rename these concepts and diverge from literally all existing naming conventions when you're providing middleware. Ie when using your tool I'm still going to reference NVIDIA's or AMD's to understand how the hardware actually works. Like do you really think otherwise - that your thing is gonna be end of the line???

FYI the word warp isn't random techno babble but is actually a very clever pun that actually fits very well conceptually:

https://en.m.wikipedia.org/wiki/Warp_and_weft

nathanielsimard · 24m ago
Using the naming from one of the existing API would put too much bias towards that API. It started as a WebGPU project early on, but some features are not present so mixing terms wasn't ideal. We're also working on extending CubeCL to CPU, so we want terms not only tied to the GPU word.
sroussey · 3m ago
Why unit instead of point?

Unit, plane (as vs train), and cube?

Or point, plane, cube (1d, 2d, 3d)?

almostgotcaught · 14m ago
Thread, subgroup, workgroup.

There you go you've hit basically two of 3 completely (AMD and Vulkan) and are close enough CUDA that people would get it.

I have no idea what a plane connotes and a cube literally gives a distinct enough picture from block that I will be continuously reminding myself of the mapping.

What you did was pointless - you assigned new words to objects that you don't own and now your conceptual framework is askew from the actual underlying (true) conceptual framework.

> CubeCL to CPU

There is zero affinity between GPU programing models and multicore CPU programing models. If you don't believe me go ask the openmp people how they're doing supporting GPUs.

nathanielsimard · 2m ago
Well we can agree to disagree, CubeCL also has the concept of instruction parallelism, which would be used to target simd instructions on CPU. Our algorithms are normally flexible on both the plane size and the line size, adapting to the hardware with comptime logique. You are free to dislike the naming, but imo a mix of multiple APIs is worse than something new.