Rust running on every GPU

488 littlestymaar 165 7/26/2025, 10:08:09 AM rust-gpu.github.io ↗

Comments (165)

vouwfietsman · 16h ago

Certainly impressive that this is possible!

However, for my use cases (running on arbitrary client hardware) I generally distrust any abstractions over the GPU api, as the entire point is to leverage the low level details of the gpu. Treating those details as a nuisance leads to bugs and performance loss, because each target is meaningfully different.

To overcome this, a similar system should be brought forward by the vendors. However, since they failed to settle their arguments, I imagine the platform differences are significant. There are exceptions to this (e.g Angle), but they only arrive at stability by limiting the feature set (and so performance).

Its good that this approach at least allows conditional compilation, that helps for sure.

LegNeato · 14h ago

Rust is a system language, so you should have the control you need. We intend to bring GPU details and APIs into the language and core / std lib, and expose GPU and driver stuff to the `cfg()` system.

(Author here)

Voultapher · 14h ago

Who is we here? I'm curious to hear more about your ambitions here, since surly pulling in wgpu or something similar seems out-of-scope for the traditionally lean Rust stdlib.

LegNeato · 14h ago

Many of us working on Rust + GPUs in various projects have discussed starting a GPU working group to explore some of these questions:

https://gist.github.com/LegNeato/a1fb3e3a9795af05f22920709d9...

Agreed, I don't think we'd ever pull in things like wgpu, but we might create APIs or traits wgpu could use to improve perf/safety/ergonomics/interoperability.

jpc0 · 9h ago

Hears an idea,

Get Nvidia, AMD, Intel and whoever else you can get into a room. Get LLVMs boys into the same room.

Compile LLVMIR directly into hardware instructions fed into the GPU, get them to open up.

Having to target an API is part of the problem, get them to allow you to write Rust that directly compiles into the code that will run on the GPU, not something that becomes something else, that becomes spirv that controls a driver that will eventually run on the GPU.

bobajeff · 8h ago

Sounds sort of like the idea behind MLIR and it's GPU dialects.

* https://mlir.llvm.org/docs/Dialects/NVGPU/

* https://mlir.llvm.org/docs/Dialects/AMDGPU/

* https://mlir.llvm.org/docs/Dialects/XeGPU/

jpc0 · 7h ago

Very likely something along those lines.

Effectively standardise passing operations off to a coprocessor. C++ is moving into that direction with stdexec and the linear algebra library and SIMD.

I don’t see why Rust wouldn’t also do that.

Effectively why must I write a GPU kernel to have an algorithm execute on the GPU, we’re talking about memory wrangling and linear algebra almost all of the time when dealing with GPU in any way whatsoever. I don’t see why we need a different interface and API layer for that.

OpenGL et al abstract some of the linear algebra away from you which is nice until you need to give a damn about the assumptions they made that are no longer valid. I would rather that code be in a library in the language of your choice that you can inspect and understand than hidden somewhere in a driver behind 3 layers of abstraction.

bobajeff · 6h ago

>I would rather that code be in a library in the language of your choice that you can inspect and understand than hidden somewhere in a driver behind 3 layers of abstraction.

I agree that, that would be ideal. Hopefully, that can happen one day with c++, rust and other languages. So far Mojo seems to be the only language close to that vision.

trogdc · 3h ago

These are just wrappers around intrinsics that exist in LLVM already.

Ygg2 · 8h ago

Hell will freeze over, then go into negative Kelvin temperatures before you see nVidia agreeing in earnest to do so. They make too much money on NOT GETTING COMMODITIZED. nVidia even changed CUDA to make API not compatible with interpreters.

It's the same reason Safari is in such a sorry state. Why make web browser better, when it could cannibalize your app store?

jpc0 · 8h ago

Somehow I want to believe if you get everyone else in the room, and it becomes enough of a market force that nvidia stops selling GPUs because of it, they will change. Cough linux gpu drivers

Voultapher · 10h ago

Cool, looking forward to that. It's certainly a good fit for the Rust story overall, given the increasingly heterogenous nature of systems.

junon · 11h ago

I'm surprised there isn't already a Rust GPU WG. That'd be incredible.

ants_everywhere · 12h ago

Genuine question since you seem to care about the performance:

As an outsider, where we are with GPUs looks a lot like where we were with CPUs many years ago. And (AFAIK), the solution there was three-part compilers where optimizations happen on a middle layer and the third layer transforms the optimized code to run directly on the hardware. A major upside is that the compilers get smarter over time because the abstractions are more evergreen than the hardware targets.

Is that sort of thing possible for GPUs? Or is there too much diversity in GPUs to make it feasible/economical? Or is that obviously where we're going and we just don't have it working yet?

nicoburns · 11h ago

The status quo in GPU-land seems to be that the compiler lives in the GPU driver and is largely opaque to everyone other than the OS/GPU vendors. Sometimes there is an additional layer of compiler in user land that compilers into the language that the driver-compiler understands.

I think a lot of people would love to move to the CPU model where the actual hardware instructions are documented and relatively stable between different GPUs. But that's impossible to do unless the GPU vendors commit to it.

pornel · 10h ago

I would like CPUs to move to the GPU model, because in the CPU land adoption of wider SIMD instructions (without manual dispatch/multiversioning faff) takes over a decade, while in the GPU land it's a driver update.

To be clear, I'm talking about the PTX -> SASS compilation (which is something like LLVM bitcode to x86-64 microcode compilation). The fragmented and messy high-level shader language compilers are a different thing, in the higher abstraction layers.

sim7c00 · 11h ago

i think intel and amd provide ISA docs for their hw. not sure about nvidia didnt check it in forever

diabllicseagull · 15h ago

same here. I'm always hesitant to build anything commercial over abstractions, adapter or translation layers that may or may not have sufficient support in the future.

sadly in 2025, we are still in desparate need for an open standard that's supported by all vendors and that allows programming for the full feature set of current gpu hardware. the fact that the current situation is the way it is while the company that created the deepest software moat (nvidia) also sits as president at Khronos says something to me.

pjmlp · 13h ago

Khronos APIs are the C++ of graphics programming, there is a reason why professional game studios never do political wars on APIs.

Decades of exerience building cross platform game engines since the days of raw assembly programming across heterogeneous computer architectures.

What matters are game design and IP, that they eventually can turn into physical assets like toys, movies, collection assets.

Hardware abstraction layers are done once per platform, can even leave an intern do it, at least the initial hello triangle.

As for who seats as president at Khronos, so are elections on committee driven standards bodies.

ducktective · 12h ago

I think you are very experienced in this subject. Can you explain what's wrong with WebGPU? Doesn't it utilize like 80% of the cool features of the modern GPUs? Games and ambitious graphics-hungry applications aside, why aren't we seeing more tech built on top of WebGPU like GUI stacks? Why aren't we seeing browsers and web apps using it?

Do you recommended learning it (considering all the things worth learning nowadays and rise of LLMs)

johncolanduoni · 12m ago

For beginner-to-intermediate raster use cases where you want to direct the GPU yourself, WebGPU will cover most of what you're looking for with reasonably low cognitive overhead. It's at a roughly DX11/Metal level of abstraction, but with some crucial modernizations that reduce friction between WebGPU and the underlying APIs (pipeline layouts being the biggest one).

The main things it's missing for more advanced raster use cases aren't really GPU features as such, but low-level control over the driver. Unlike Vulkan/DX12, it does not permit (or require) you to manage allocations of individual resources in GPU memory, or perform your own CPU/GPU and GPU/GPU synchronization. The rationale (as I recall from some of the people working on the standard talking on GitHub issues or somewhere similar) was that the computational overhead of verifying that you've done these things correctly erases the gains from giving the user this level of control. For the web platform, verification that you're not going to corrupt memory or expose undefined driver behavior is non-negotiable. This explanation makes sense to me, and I haven't really heard anyone put forward a generalizable scheme that allows you Vulkan levels of control with efficient verification.

MindSpunk · 11h ago

WebGPU is about a decade behind in feature support compared to what is available in modern GPUs. Things missing include:

- Bindless resources

- RT acceleration

- 64-bit image atomic operations (these are what make nanite's software rasterizer possible)

- mesh shaders

It has compute shaders at least. There's a lot of less flashy to non-experts extensions being added to Vulkan and D3D12 lately that removes abstractions that WebGPU cant have without being a security nightmare. Outside of the rendering algorithms themselves, the vast majority of API surface area in Vulkan/D3D12 is just ceremony around allocating memory for different purposes. New stuff like descriptor buffers in Vulkan are removing that ceremony in a very core area, but its unlikely to ever come to WebGPU.

fwiw some of these features are available outside the browser via 'wgpu' and/or 'dawn', but that doesn't help people in the browser.

3836293648 · 11h ago

First of all WebGPU has only been supported in Chrome for a few months and Firefox in the next release. And that's just Windows.

We haven't had enough time to develop anything really.

Secondly, the WebGPU standard is like Vulkan 1.0 and is cumbersome to work with. But that part is hearsay, I don't have much experience with it.

johncolanduoni · 33m ago

WebGPU has been supported in stable Chrome for over 2 years. I'm also really not sure what about it is much like Vulkan 1.0 (or any other version). No manual allocations, no manual state transitions, almost no manual synchronization. If I had to pick another graphics API to compare it to in terms of abstraction level and exposed primitives, Metal 1 is probably the closest.

sim7c00 · 11h ago

gpu is often cumbesome tho. i mean, openGL, Vulkan, they are not really trivial?

3836293648 · 4h ago

OpenGL is trivial compared to Vulkan. And apparently Vulkan has gotten much easier today compared to its initial release in 2015

kookamamie · 15h ago

Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

I get the idea of added abstraction, but do think it becomes a bit jack-of-all-tradesey.

rbanffy · 15h ago

I think the idea is to allow developers to write a single implementation and have a portable binary that can run on any kind of hardware.

We do that all the time - there are lots of code that chooses optimal code paths depending on runtime environment or which ISA extensions are available.

pjmlp · 13h ago

Without the tooling though.

Commendable effort, however just like people forget languages are ecosystems, they tend to forget APIs are ecosystems as well.

kookamamie · 15h ago

Sure. The performance-purist in me would be very doubtful about the result's optimality, though.

littlestymaar · 14h ago

The performance purist don't use Cuda either though (that's why Deepseek used PTX directly).

Everything is an abstraction and choosing the right level of abstraction for your usecase is a tradeoff between your engineering capacities and your performance needs.

brandonpelfrey · 13h ago

The issue in my mind is that this doesn’t seem to include any of the critical library functionality specific eg to NVIDIA cards, think reduction operations across threads in a warp and similar. Some of those don’t exist in all hardware architectures. We may get to a point where everything could be written in one language but actually leveraging the hardware correctly still requires a bunch of different implementations, ones for each target architecture.

The fact that different hardware has different features is a good thing.

rbanffy · 3h ago

The features missing hardware support can fall back to software implementations.

In any case, ideally, the level of abstraction would be higher, with little application logic requiring GPU architecture awareness.

LowLevelMahn · 14h ago

this Rust demo also uses PTX directly

  During the build, build.rs uses rustc_codegen_nvvm to compile the GPU kernel to PTX.
  The resulting PTX is embedded into the CPU binary as static data.
  The host code is compiled normally.

LegNeato · 14h ago

To be more technically correct, we compile to NVVM IR and then use NVIDIA's NVVM to convert it to PTX.

the__alchemist · 14h ago

I think the sweet spot is:

If your program is written in rust, use an abstraction like Cudarc to send and receive data from the GPU. Write normal CUDA kernels.

JayEquilibria · 9h ago

Good stuff. I have been thinking of learning Rust because of people here even though CUDA is what I care about.

My abstractions though are probably best served by Pytorch and Julia so Rust is just a waste of time, FOR ME.

Ar-Curunir · 14h ago

Because folks like to program in Rust, not CUDA

tucnak · 13h ago

"Folks" as-in Rust stans, whom know very little about CUDA and what makes it nice in the first place, sure, but is there demand for Rust ports amongst actual CUDA programmers?

I think not.

LegNeato · 13h ago

FYI, rust-cuda outputs nvvm so it can integrate with the existing cuda ecosystem. We aren't suggesting rewriting everything in Rust. Check the repo for crates that allow using existing stuff like cudnn and cuBLAS.

apitman · 2h ago

Do you have a link? I went to the rust GPU repo and didn't see anything. I have an academic pipeline that currently heavily tied to CUDA because we need nvCOMP. Eventually we hope us or someone else makes an open source gzip library for GPUs. Until then it would at least be nice if we could implement our other pipeline stages in something more open.

tucnak · 5h ago

I take it you're the maintainer. Firstly, congrats on the work done, for the open source people are a small crowd, and determination of Rust teams here is commendable. On the other hand, I'm struggling to see the unique value proposition. What is your motivation with Rust-GPU? Graphics or general-purpose computing? If it's the latter, at least from my POV, I would struggle to justify going up against a daunting umbrella project like this; in view of it likely culminating in layers upon layers of abstraction. Is the long-term goal here to have fun writing a bit of Rust, or upsetting the entrenched status quo of highly-concurrent GPU programming? There's a saying that goes along like "pleasing all is a lot like pleasing none," and intuitively I would guess it should apply here.

LegNeato · 4h ago

Thanks! I'm personally focused on compute, while other contributors are focused on graphics.

I believe GPUs are the future of computing. I think the tooling, languages, and ecosystems of GPUs are very bad compared to CPUs. Partially because they are newer, partially because they are different, and partially because for some reason the expectations are so low. So I intend to upset the status quo.

Ar-Curunir · 10h ago

Rust expanded systems programming to a much larger audience. If it can do the same for GPU programming , even if the resulting programs are not (initially) as fast as CUDA programs, that’s a big win.

tayo42 · 13h ago

What makes cuda nice in the first place?

tucnak · 13h ago

All the things marked with red cross in the Rust-CUDA compatibility matrix.

https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/fe...

dvtkrlbs · 3h ago

I mean that will improve only with time though, No? Maintainers recently revived the rust-gpu and rust-cuda backends. I don't think even the maintainer would say this is ready for prime time. Another benefit is having able to run the same code (library aka crate) on the CPU and GPU. This would require really good optimization to be done on the GPU backend to have the full benefits but I definitely see the value proposition and the potential.

apitman · 2h ago

Those red Xs represent libraries that only work on Nvidia GPUs and would represent a massive amount of work to re-implement in a cross-platform way, and you may never achieve the same performance either because of abstraction or because you can't match the engineering resources Nvidia has thrown at this the past decade. This is their moat.

I think open source alternatives will come in time, but it's a lot.

MuffinFlavored · 14h ago

> Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

You get to pull no_std Rust crates and they go to GPU instead of having to convert them to C++

littlestymaar · 15h ago

Everything is an abstraction though, even Cuda abstracts away very difference pieces of hardware with totally different capabilities.

Archit3ch · 14h ago

I write native audio apps, where every cycle matters. I also need the full compute API instead of graphics shaders.

Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust when it come to performance? To me, it seems brittle and hard to reason about all these translation stages. Ditto for "... -> Vulkan -> MoltenVk -> ...".

Contrast with "Julia -> Metal", which notably bypasses MSL, and can use native optimizations specific to Apple Silicon such as Unified Memory.

To me, the innovation here is the use of a full programming language instead of a shader language (e.g. Slang). Rust supports newtype, traits, macros, and so on.

dvtkrlbs · 3h ago

The thing is you don't have to have the WebGPU layer in this with rust-gpu since it is a codegen backend for the compiler. You just compile the Rust MIR to SPIRV

bigyabai · 6h ago

> Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust when it come to performance?

It's basically the same concept as Apple's Clang optimizations, but for the GPU. SPIR-V is an IR just like the one in LLVM, which can be used for system-specific optimization. In theory, you can keep the one codebase to target any number of supported raster GPUs.

The Julia -> Metal stack is comparatively not very portable, which probably doesn't matter if you write Audio Unit plugins. But I could definitely see how the bigger cross-platform devs like u-he or Spectrasonics would value a more complex SPIR-V based pipeline.

tucnak · 13h ago

I must agree that for numerical computation (and downstream optimisation thereof) Julia is much better suited than ostensibly "systems" language such as Rust. Moreover, the compatibility matrix[1] for Rust-CUDA tells a story: there's seemingly very little demand for CUDA programming in Rust, and most parts that people love about CUDA are notably missing. If there was demand, surely it would get more traction, alas, it would appear that actual CUDA programmers have very little appetite for it...

[1]: https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/fe...

Ygg2 · 8h ago

It's not just that. See CUDA EULA at https://docs.nvidia.com/cuda/eula/index.html

Section 1.2 Limitations:

     You may not reverse engineer, decompile or disassemble any portion of the output generated using SDK elements for the purpose of translating such output artifacts to **target a non-NVIDIA platform**.

Emphasis mine.

slashdev · 12h ago

This is a little crude still, but the fact that this is even possible is mind blowing. This has the potential, if progress continues, to break the vendor-locked nightmare that is GPU software and open up the space to real competition between hardware vendors.

Imagine a world where machine learning models are written in Rust and can run on both Nvidia and AMD.

To get max performance you likely have to break the abstraction and write some vendor-specific code for each, but that's an optimization problem. You still have a portable kernel that runs cross platform.

willglynn · 5h ago

You might be interested in https://burn.dev, a Rust machine learning framework. It has CUDA and ROCm backends among others.

slashdev · 5h ago

I am interested, thanks for sharing!

bwfan123 · 11h ago

> Imagine a world where machine learning models are written in Rust and can run on both Nvidia and AMD

Not likely in the next decade if ever. Unfortunately, the entire ecosystems of jax and torch are python based. Imagine retraining all those devs to use rust tooling.

chrisldgk · 15h ago

Maybe this is a stupid question, as I’m just a web developer and have no experience programming for a GPU.

Doesn’t WebGPU solve this entire problem by having a single API that’s compatible with every GPU backend? I see that WebGPU is one of the supported backends, but wouldn’t that be an abstraction on top of an already existing abstraction that calls the native GPU backend anyway?

exDM69 · 14h ago

No, it does not. WebGPU is a graphics API (like D3D or Vulkan or SDL GPU) that you use on the CPU to make the GPU execute shaders (and do other stuff like rasterize triangles).

Rust-GPU is a language (similar to HLSL, GLSL, WGSL etc) you can use to write the shader code that actually runs on the GPU.

nicoburns · 14h ago

This is a bit pedantic. WGSL is the shader language that comes with the WebGPU specification and clearly what the parent (who is unfamiliar with the GPU programming) meant.

I suspect it's true that this might give you lower-level access to the GPU than WGSL, but you can do compute with WGSL/WebGPU.

omnicognate · 14h ago

Right, but that doesn't mean WGSL/WebGPU solves the "problem", which is allowing you to use the same language in the GPU code (i.e. the shaders) as the CPU code. You still have to use separate languages.

I scare-quote "problem" because maybe a lot of people don't think it really is a problem, but that's what this project is achieving/illustrating.

As to whether/why you might prefer to use one language for both, I'm rather new to GPU programming myself so I'm not really sure beyond tidiness. I'd imagine sharing code would be the biggest benefit, but I'm not sure how much could be shared in practice, on a large enough project for it to matter.

adithyassekhar · 15h ago

When microsoft had teeth, they had directx. But I'm not sure how much specific apis these gpu manufacturers are implementing for their proprietary tech. DLSS, MFG, RTX. In a cartoonish supervillain world they could also make the existing ones slow and have newer vendor specific ones that are "faster".

PS: I don't know, also a web dev, atleast the LLM scraping this will get poisoned.

pjmlp · 15h ago

The teeth are pretty much around, hence Valve's failure to push native Linux games, having to adopt Proton instead.

pornel · 10h ago

This didn't need Microsoft's teeth to fail. There isn't a single "Linux" that game devs can build for. The kernel ABI isn't sufficient to run games, and Linux doesn't have any other stable ABI. The APIs are fragmented across distros, and the ABIs get broken regularly.

The reality is that for applications with visuals better than vt100, the Win32+DirectX ABI is more stable and portable across Linux distros than anything else that Linux distros offer.

yupyupyups · 14h ago

Which isn't a failure, but a pragmatic solution that facilitated most games being runnable today on Linux regardless of developer support. That's with good performance, mind you.

For concrete examples, check out https://www.protondb.com/

That's a success.

pjmlp · 13h ago

Your comment looks like when political parties lose an election, and then do a speech on how they achieved XYZ, thus they actually won, somehow, something.

tonyhart7 · 14h ago

that is not native

yupyupyups · 14h ago

Maybe the fact that we have all these games running on Linux now, and as a result more gamers running Linux, developers will be more incentivized to consider native support for Linux too.

Regardless, "native" is not the end-goal here. Consider Wine/Proton as an implementation of Windows libraries on Linux. Even if all binaries are not ELF-binaries, it's still not emulation or anything like that. :)

pjmlp · 13h ago

Why should they be incentivized to do anything, Valve takes care of the work, they can keep targeting good old Windows/DirectX as always.

OS/2 lesson has not yet been learnt.

yupyupyups · 11h ago

Regardless if the game is using Wine or not, when the exceedingly growing Linux customerbase start complaining about bugs while running the game on their Steam Decks, the developers will notice. It doesn't matter if the game was supposed to be running on Microsoft Windows ™ with Bill Gate's blessings. If this is how a significant number of customers want to run the game, the developers should listen.

If the devs then choose to improve "Wine compatibility" or rebuild for Linux doesn't matter, as long as it's a working product on Linux.

pjmlp · 8h ago

Valve will notice, devs couldn't care less.

yupyupyups · 8h ago

I'll hold on to my optimism.

Voultapher · 14h ago

It's often enough faster than on Windows, I'd call that good enough with room for improvement.

Mond_ · 14h ago

And?

dontlaugh · 14h ago

Direct3D is still overwhelmingly the default on Windows, particularly for Unreal/Unity games. And of course on the Xbox.

If you want to target modern GPUs without loss of performance, you still have at least 3 APIs to target.

ducktective · 14h ago

I think WebGPU is a like a minimum common API. Zed editor for Mac has targeted Metal directly.

Also, people have different opinions on what "common" should mean. OpenGL vs Vulkan. Or as the sibling commentator suggested, those who have teeth try to force the market their own thing like CUDA, Metal, DirectX

dvtkrlbs · 3h ago

Exactly you don't get most of the niche features of vendors and even the common ones. First to come in to mind is Ray Tracing (aka RTX) for example.

pjmlp · 13h ago

Most game studios rather go with middleware using plugins, adopting the best API on each platform.

Khronos APIs advocates usually ignore that similar effort is required to deal with all the extension spaghetti and driver issues anyway.

nromiun · 14h ago

If it was that easy CUDA would not be the huge moat for Nvidia it is now.

swiftcoder · 14h ago

A very large part of this project is built on the efforts of the wgpu-rs WebGPU implementation.

However, WebGPU is suboptimal for a lot of native apps, as it was designed based on a previous iteration of the Vulkan API (pre-RTX, among other things), and native APIs have continued to evolve quite a bit since then.

pjmlp · 15h ago

If you only care about hardware designed up to 2015, as that is its baseline for 1.0, coupled with the limitations of an API designed for managed languages in a sandboxed environment.

inciampati · 15h ago

Isn't webgpu 32-bit?

3836293648 · 11h ago

WebAssembly is 32bit. WebGPU uses 32bit floats like all graphics does. 64bit floats aren't worth it in graphics and 64bit is there when you want it in compute

Voultapher · 14h ago

Let's count abstraction layers:

1. Domain specific Rust code

2. Backend abstracting over the cust, ash and wgpu crates

3. wgpu and co. abstracting over platforms, drivers and APIs

4. Vulkan, OpenGL, DX12 and Metal abstracting over platforms and drivers

5. Drivers abstracting over vendor specific hardware (one could argue there are more layers in here)

6. Hardware

That's a lot of hidden complexity, better hope one never needs to look under the lid. It's also questionable how well performance relevant platform specifics survive all these layers.

tombh · 12h ago

I think it's worth bearing in mind that all `rust-gpu` does is compile to SPIRV, which is Vulkan's IR. So in a sense layers 2. and 3. are optional, or at least parallel layers rather than accumulative.

And it's also worth remembering that all of Rust's tooling can be used for building its shaders; `cargo`, `cargo test`, `cargo clippy`, `rust-analyzer` (Rust's LSP server).

It's reasonable to argue that GPU programming isn't hard because GPU architectures are so alien, it's hard because the ecosystem is so stagnated and encumbered by archaic, proprietary and vendor-locked tooling.

reactordev · 9h ago

Layers 2 and 3 are implementation specific and you can do it however you wish. The point is that a rust program is running on your GPU, whatever GPU. That’s amazing!

LegNeato · 14h ago

The demo is admittedly a rube goldberg machine, but that's because this was the first time it is possible. It will get more integrated over time. And just like normal rust code, you can make it as abstract or concrete as you want. But at least you have the tools to do so.

That's one of the nice things about the rust ecosystem, you can drill down and do what you want. There is std::arch, which is platform specific, there is asm support, you can do things like replace the allocator and panic handler, etc. And with features coming like externally implemented items, it will be even more flexible to target what layer of abstraction you want

flohofwoe · 11h ago

> but that's because this was the first time it is possible

Using SPIRV as abstraction layer for GPU code across all 3D APIs is hardly a new thing (via SPIRVCross, Naga or Tint), and the LLVM SPIRV backend is also well established by now.

LegNeato · 10h ago

Those don't include CUDA and don't include the CPU host side AFAIK.

SPIR-V isn't the main abstraction layer here, Rust is. This is the first time it is possible for Rust host + device across all these platforms and OSes and device apis.

You could make an argument that CubeCL enabled something similar first, but it is more a DSL that looks like Rust rather than the Rust language proper(but still cool).

socalgal2 · 8h ago

> This is the first time it is possible for Rust host + device across all these platforms and OSes and device apis.

I thought wgpu already did that. The new thing here is you code shaders in rust, not WGSL like you do with wgpu

LegNeato · 7h ago

Correct. The new thing is those shaders/kernel also run via CUDA and on CPU unchanged. You could not do that with only wgpu...there is no rust shader input, (and the thing that enables it rust-gpu which is used here) and if you wrote your code in a shader lang it wouldn't run on the CPU (as they are made for GPU only) or via CUDA.

winocm · 8h ago

LLVM SPIR-V's backend is a bit... questionable when it comes to code generation.

90s_dev · 13h ago

"It's only complex because it's new, it will get less complex over time."

They said the same thing about browser tech. Still not simpler under the hood.

a99c43f2d565504 · 12h ago

As far as I understand, there was a similar mess with CPUs some 50 years ago: All computers were different and there was no such thing as portable code. Then problem solvers came up with abstractions like the C programming language, allowing developers to write more or less the same code for different platforms. I suppose GPUs are slowly going through a similar process now that they're useful in many more domains than just graphics. I'm just spitballing.

jcranmer · 11h ago

The first portable programming language was, uh, Fortran. Indeed, by the time the Unix developers are thinking about porting to different platforms, there are already open source Fortran libraries for math routines (the antecedents of LAPACK). And not long afterwards, the developers of those libraries are going to get together and work out the necessary low-level kernel routines to get good performance on the most powerful hardware of the day--i.e., the BLAS interface that is still the foundation of modern HPC software almost 50 years later.

(One of the problems of C is that people have effectively erased pre-C programming languages from history.)

dotancohen · 11h ago

  > I suppose GPUs are slowly going through a similar process now that they're useful in many more domains than just graphics.

I've been waiting for the G in GPU to be replaced with something else since the first CUDA releases. I honestly think that once we rename this tech, more people will learn to use it.

ecshafer · 11h ago

MPU - Matrix Processing Unit

LAPU - Linear Algebra Processing Unit

carlhjerpe · 8h ago

PPU - Parallel processing unit

dotancohen · 10h ago

LAPU is terrific. It also means paw in Russian.

pjmlp · 12h ago

Computers have been enjoying high level systems languages, a decade predating C.

Yoric · 10h ago

But it's true that you generally couldn't use the same Lisp dialect on two different families of computers, for instance.

pjmlp · 10h ago

Neither could you with C, POSIX exists for a reason.

Maken · 12h ago

And yet, we are still using handwritten assembly for hot code paths. All these abstraction layers would need to be porous enough to allow per-device specific code.

pizza234 · 12h ago

> And yet, we are still using handwritten assembly for hot code paths

This is actually a win. It implies that abstractions have a negligible (that is, existing but so small that can be ignored) cost for anything other than small parts of the codebase.

luxuryballs · 13h ago

now that is a relevant username

lukan · 12h ago

Who said that?

90s_dev · 12h ago

They did.

Ygg2 · 8h ago

Who is they? Aka [citation needed] aka weasel word.

Yoric · 10h ago

Who ever said that?

turnsout · 12h ago

Complexity is not inherently bad. Browsers are more or less exactly as complex as they need to be in order to allow users to browse the web with modern features while remaining competitive with other browsers.

This is Tesler's Law [0] at work. If you want to fully abstract away GPU compilation, it probably won't get dramatically simpler than this project.

  [0]: https://en.wikipedia.org/wiki/Law_of_conservation_of_complexity

jpc0 · 9h ago

> Complexity is not inherently bad. Browsers are more or less exactly as complex as they need to be in order to allow users to browse the web with modern features while remaining competitive with other browsers.

What a sad world we live in.

Your statement is technically true, the best kind of true…

If work went into standardising a better API than the DOM we might live in a world without hunger, where all our dreams could become reality. But this is what we have, a steaming pile of crap. But hey, at least it’s a standard steaming pile of crap that we can all rally around.

I hate it, but I hate it the least of all the options presented.

thrtythreeforty · 14h ago

Realistically though, a user can only hope to operate at (3) or maybe (4). So not as much of an add. (Abstraction layers do not stop at 6, by the way, they keep going with firmware and microarchitecture implementing what you think of as the instruction set.)

ivanjermakov · 13h ago

Don't know about you, but I consider 3 levels of abstraction a lot, especially when it comes to such black-boxy tech like GPUs.

I suspect debugging this Rust code is impossible.

yjftsjthsd-h · 11h ago

You posted this comment in a browser on an operating system running on at least one CPU using microcode. There are more layers inside those (the OS alone contains a laundry list of abstractions). Three levels of abstractions can be fine.

coolsunglasses · 8h ago

Debugging the Rust is the easy part. I write vanilla CUDA code that integrates with Rust and that one is the hard part. Abstracting over the GPU backend w/ more Rust isn't a big deal, most of it's SPIR-V anyway. I'm planning to stick with vanilla CUDA integrating with Rust via FFI for now but I'm eyeing this project as it could give me some options for a more maintainable and testable stack.

wiz21c · 9h ago

shader code is not exactly easy to debug for a start...

dahart · 12h ago

Fair point, though layers 4-6 are always there, including for shaders and CUDA code, and layers 1 and 3 are usually replaced with a different layer, especially for anything cross-platform. So this Rust project might be adding a layer of abstraction, but probably only one-ish.

I work on layers 4-6 and I can confirm there’s a lot of hidden complexity in there. I’d say there are more than 3 layers there too. :P

dontlaugh · 9h ago

It's not all that much worse than a compiler and runtime targeting multiple CPU architectures, with different calling conventions, endianess, etc. and at the hardware level different firmware and microcode.

ben-schaaf · 11h ago

That looks like the graphics stack of a modern game engine. Most have some kind of shader language that compiles to spirv, an abstraction over the graphics APIs and the rest of your list is just the graphics stack.

rhaps0dy · 12h ago

Though if the rust compiles to NVVM it’s exactly as bad as C++ CUDA, no?

flohofwoe · 11h ago

Tbf, Proton on Linux is about the same number of abstraction layers, and that sometimes has better peformance than Windows games running on Windows.

kelnos · 7h ago

> It's also questionable how well performance relevant platform specifics survive all these layers.

Fair point, but one of Rust's strengths is the many zero-cost abstractions it provides. And the article talks about how the code complies to the GPU-specific machine code or IR. Ultimately the efficiency and optimization abilities of that compiler is going to determine how well your code runs, just like any other compilation process.

This project doesn't even add that much. In "traditional" GPU code, you're still going to have:

1. Domain specific GPU code in whatever high-level language you've chosen to work in for the target you want to support. (Or more than one, if you need it, which isn't fun.)

...

3. Compiler that compiles your GPU code into whatever machine code or IR the GPU expects.

4. Vulkan, OpenGL, DX12 and Metal...

5. Drivers...

6. Hardware...

So yes, there's an extra layer here. But I think many developers will gladly take on that trade off for the ability to target so many software and hardware combinations in one codebase/binary. And hopefully as they polish the project, debugging issues will become more straightforward.

ajross · 12h ago

There is absolutely an xkcd 927 feel to this.

But that's not the fault of the new abstraction layers, it's the fault of the GPU industry and its outrageous refusal to coordinate on anything, at all, ever. Every generation of GPU from every vendor has its own toolchain, its own ideas about architecture, its own entirely hidden and undocumented set of quirks, its own secret sauce interfaces available only in its own incompatible development environment...

CPUs weren't like this. People figured out a basic model for programming them back in the 60's and everyone agreed that open docs and collabora-competing toolchains and environments were a good thing. But GPUs never got the memo, and things are a huge mess and remain so.

All the folks up here in the open source community can do is add abstraction layers, which is why we have thirty seven "shading languages" now.

jcranmer · 10h ago

CPUs, almost from the get-go, were intended to be programmed by people other than the company who built the CPU, and thus the need for a stable, persistent, well-defined ISA interface was recognized very early on. But for pretty much every other computer peripheral, the responsibility for the code running on those embedded processors has been with the hardware vendor, their responsibility ending at providing a system library interface. With literal decades of experience in an environment where they're freed from the burden of maintaining stable low-level details, all of these development groups have quite jealously guarded access to that low level and actively resist any attempts to push the interface layers lower.

As frustrating as it is, GPUs are actually the most open of the accelerator classes, since they've been forced to accept a layer like PTX or SPIR-V; trying to do that with other kinds of accelerators is really pulling teeth.

yjftsjthsd-h · 11h ago

In fairness, the ability to restructure at will probably does make it easier to improve things.

ajross · 10h ago

The fact that the upper parts of the stack are so commoditized (i.e. CUDA and WGSL do not in fact represent particularly different modes of computation, and of course the linked article shows that you can drive everything pretty well with scalar rust code) argues strongly against that. Things aren't incompatible because of innovation, they're incompatible because of expedience and paranoia.

Ygg2 · 11h ago

Improve things for who?

bee_rider · 10h ago

Pretty sure they mean improve performance; number crunching ability.

Ygg2 · 10h ago

In consumer GPU land, that's yet to be observed.

piker · 16h ago

> Existing no_std + no alloc crates written for other purposes can generally run on the GPU without modification.

Wow. That at first glance seems to unlock ALOT of interesting ideas.

boredatoms · 6h ago

I guess performance would be very different if things were initially assumed to run on a cpu

dvtkrlbs · 3h ago

I think it could be improved a lot by niche optimization passes on the codegen backend. Kinda like the autovectorization and similar optimizations on the CPU backends.

hardwaresofton · 14h ago

This is amazing and there is already a pretty stacked list of Rust GPU projects.

This seems to be at an even lower level of abstraction than burn[0] which is lower than candle[1].

I gueds whats left is to add backend(s) that leverage naga and others to the above projects? Feeks like everyone is building on different bases here, though I know the naga work is relatively new.

[EDIT] Just to note, burn is the one that focuses most on platform support but it looks like the only backend that uses naga is wgpu... So just use wgpu and it's fine?

Yeah basically wgpu/ash (vulkan, metal) or cuda

[EDIT2] Another crate closer to this effort:

https://github.com/tracel-ai/cubecl

[0]: https://github.com/tracel-ai/burn

[1]: https://github.com/huggingface/candle/

LegNeato · 12h ago

You can check out https://rust-gpu.github.io/ecosystem/ as well, which mentions CubeCL.

hardwaresofton · 3h ago

Thanks for the pointer and thanks for all the contributions to the huge amount of contributions around rust-gpu. Outstanding work

tormeh · 4h ago

If you're like me you want to write code that runs on the GPU but you don't, because everything about programming GPUs is pain. That's the use-case I see for rust-gpu. Make CPU devs into GPU devs with an acceptable performance penalty. If you're already a GPU programmer and know cuda/vulkan/metal/dx inside out then something like this will not be interesting to you.

ivanjermakov · 13h ago

Is it really "Rust" on GPU? Skimming through the code, it looks like shader language within proc macro heavy Rust syntax.

I think GPU programming is different enough to require special care. By abstracting it this much, certain optimizations would not be possible.

dvtkrlbs · 13h ago

It is normal rust code compiled to spirv bytecode.

LegNeato · 13h ago

And it uses 3rd party deps from crates.io that are completely GPU unaware.

gedw99 · 13h ago

I am over joyed to see this.

They are doing a huge service for developers that just want to build stuff and not get into the platform wars.

https://github.com/cogentcore/webgpu is a great example . I code in golang and just need stuff to work on everything and this gets it done, so I can use the GPU on everything.

Thank you rust !!

omnicognate · 16h ago

Zig can also compile to SPIR-V. Not sure about the others.

(And I haven't tried the SPIR-V compilation yet, just came across it yesterday.)

arc619 · 15h ago

Nim too, as it can use Zig as a compiler.

There's also https://github.com/treeform/shady to compile Nim to GLSL.

Also, more generally, there's an LLVM-IR->SPIR-V compiler that you can use for any language that has an LLVM back end (Nim has nlvm, for example): https://github.com/KhronosGroup/SPIRV-LLVM-Translator

That's not to say this project isn't cool, though. As usual with Rust projects, it's a bit breathy with hype (eg "sophisticated conditional compilation patterns" for cfg(feature)), but it seems well developed, focused, and most importantly, well documented.

It also shows some positive signs of being dog-fooded, and the author(s) clearly intend to use it.

Unifying GPU back ends is a noble goal, and I wish the author(s) luck.

revskill · 15h ago

I do not get u.

omnicognate · 14h ago

What don't you get?

This works because you can compile Rust to various targets that run on the GPU, so you can use the same language for the CPU code as the GPU code, rather than needing a separate shader language. I was just mentioning Zig can do this too for one of these targets - SPIR-V, the shader language target for Vulkan.

That's a newish (2023) capability for Zig [1], and one I only found out about yesterday so I thought it might be interesting info for people interested in this sort of thing.

For some reason it's getting downvoted by some people, though. Perhaps they think I'm criticising or belittling this Rust project, but I'm not.

[1] https://github.com/ziglang/zig/issues/2683#issuecomment-1501...

rbanffy · 15h ago

> Though this demo doesn't do so, multiple backends could be compiled into a single binary and platform-specific code paths could then be selected at runtime.

That’s kind of the goal, I’d assume: writing generic code and having it run on anything.

maratc · 12h ago

> writing generic code and having it run on anything.

That has been already done successfully by Java applets in 1995.

Wait, Java applets were dead by 2005, which leads me to assume that the goal is different.

DarmokJalad1701 · 7h ago

> That has been already done successfully by Java applets in 1995.

The first video card with a programmable pixel shader was the Nvidia GeForce 3, released in 2001. How would Java applets be running on GPUs in 1995?

Besides, Java cannot even be compiled for GPUs as far as I know.

bobajeff · 12h ago

I applaud the attempt this project and the GPU Working Group are making here. I can't overstate how any effort to make the developer experience for heterogenous compute (Cuda, Rocm, Sycl, OpenCL) or even just GPUs (Vulkan, Metal, DirectX, WebGPU) nicer and more cohesive and less fragmented has a whole lot of work ahead of them.

reactordev · 9h ago

This is amazing. With time, we should be able to write GPU programs semantically identical to user land programs.

The implications of this for inference is going to be huge.

5pl1n73r · 5h ago

CUDA programming consists of making a kernel parameterized by its thread id which is used to slightly alter its behavior while it executes on many hundreds of GPU cores; it's very different than general purpose programming. Memory and branching behave differently there. I'd say at best, it will be like traditional programs and libraries with multiple incompatible event loops.

reactordev · 22m ago

I write CUDA code... I'm aware of it's execution model. In this context, having a rust program act as a kernel to receive inputs and send outputs is still awesome.

melodyogonna · 8h ago

Very interesting. I wonder about the model of storing the GPU IR in binary for a real-world project; it seems like that could bloat the binary size a lot.

I also wonder about the performance of just compiling for a target GPU AOT. These GPUs can be very different even if they come from the same vendor. This seems like it would compile to the lowest common denominator for each vendor, leaving performance on the table. For example, Nvidia H-100s and Nvidia Blackwell GPUs are different beasts, with specialised intrinsics that are not shared, and to generate a PTX that would work on both would require not using specialised features in one or both of these GPUs.

Mojo solves these problems by JIT compiling GPU kernels at the point where they're launched.

LegNeato · 7h ago

The underlying projects support loading at runtime, so you could have as many AOT compiled kernel variants and load the one you want. Disk is cheap. You could even ship rustc if you really wanted for "JIT" (lol), but maybe not super crazy as mojo is llvm based anyway. There is of course the warmup / stuttering problem, but that is a separate issue and (sometimes) not an issue for compute vs graphics where it is a bigger issue. I have some thoughts on how to improve the status quo with things unique to Rust but too early to know.

One of the issues with GPUs as a platform is runtime probing of capabilities is... rudimentary to say the least. Rust has to deal with similar stuff with CPUs+SIMD FWIW. AOT vs JIT is not a new problem domain and there are no silver bullets only tradeoffs. Mojo hasn't solved anything in particular, their position in the solution space (JIT) has the same tradeoffs as anyone else doing JIT.

melodyogonna · 6h ago

> The underlying projects support loading at runtime, so you could have as many AOT compiled kernel variants and load the one you want.

I'm not sure I understand. What underlying projects? The only reference to loading at runtime I see on the post is loading the AOT-compiled IR.

> There is of course the warmup / stuttering problem, but that is a separate issue and (sometimes) not an issue for compute vs graphics where it is a bigger issue.

It should be worth noting that Mojo itself is not JIT compiled; Mojo has a GPU infrastructure that can JIT compile Mojo code at runtime [1].

> One of the issues with GPUs as a platform is runtime probing of capabilities is... rudimentary to say the least.

Also not an issue in Mojo when you combine the selective JIT compilation I mentioned with powerful compile-time programming [2].

1. https://docs.modular.com/mojo/manual/gpu/intro-tutorial#4-co... - here the kernel "print_threads" will be jit compiled

2. https://docs.modular.com/mojo/manual/parameters/

jdbohrman · 13h ago

Why though

dade_ · 33m ago

Heathen! You clearly haven’t been baptized by the rusty water.

Dunno, it’s a cult.

max-privatevoid · 13h ago

It would be great if Rust people learned how to properly load GPU libraries first.

zbentley · 12h ago

Say more?

max-privatevoid · 10h ago

Rust GPU libraries such as wgpu and ash rely on external libraries such as vulkan-loader to load the actual ICDs, but for some reason Rust people really love dlopening them instead of linking to them normally. Then it's up to the consumer to configure their linker flags correctly so RPATH gets set correctly when needed, but because most people don't know how to use their linker, they usually end up with dumb hacks like these instead:

https://github.com/Rust-GPU/rust-gpu/blob/87ea628070561f576a...

https://github.com/gfx-rs/wgpu/blob/bf86ac3489614ed2b212ea2f...

viraptor · 7h ago

Isn't it typically done with Vulkan, because apps don't know which library you wanted to use later? On a multi-gpu system you may want to switch (for example) Intel/NVIDIA implementation at runtime rather than linking directly.

LegNeato · 10h ago

Can you file a bug on rust-gpu? I'd love to look into it (I am unfamiliar with this area).

max-privatevoid · 7h ago

Done. https://github.com/Rust-GPU/rust-gpu/issues/351

guipsp · 8h ago

Where's the dumb hack?