Activeloop(YC S18)Is Hiring Senior Backend and AI Search Engineer(Mountain View) (careers.activeloop.ai)

1 points by davidbuniat 17d ago 0 comments

Morph (YC S23) Is Hiring a ML Engineer

1 points by bhaktatejas922 18d ago 0 comments

Layers All the Way Down: The Untold Story of Shader Compilation

91 birdculture 48 5/19/2025, 2:49:25 AM moonside.games ↗

Comments (48)

raphlinus · 33d ago

We tried something like this with piet-gpu-hal. One problem is that spirv-cross is lossy, though gaps in target language support are getting better. For example, a device scoped barrier is just dropped on the floor before Metal 3.2. Atomics are also a source of friction.

But the main problem is not the shader language itself, but the binding model. It's a pretty big mess, and things change as you go in the direction of bindless (descriptor indexing). There are a few approaches to this, certainly reinventing WebGPU is one. Another intriguing approach is blade[1] by Dzmitry Malyshau.

I wish the authors well, but this is a hard road, especially if the goal is to enable more advanced features including compute.

[1]: https://github.com/kvark/blade

pavlov · 33d ago

I’d very much like to read about Blade, but seems like they have literally no documentation in text format, not even a basic introduction. Every link on the GitHub page goes to YouTube.

Project authors, please don’t do this. It’s impossible to get a two-minute overview from a video. Browsing through tutorials and documentation is much more efficient.

If you really have never written anything about the project except conference slides, then at least put up that deck in addition to the YouTube link. Clicking through slides is not great, but it’s still a better browsing experience than seeking at random in a video.

pjmlp · 33d ago

As expected, they don't touch the issue of how shaders work in PlayStation (LibGNM, LibGNMX) and Switch (NVN).

genocidicbunny · 33d ago

I really do wish that Sony made even more info about GNM and GNMX public. I was only starting to learn it when I got laid off and lost my access. I may or may not still have some older docs that found their way into my box as I was leaving on the last day, but if any did, they're definitely incomplete. I spent most of my time working on non-graphics parts of the project, so my time that I got to spend on digging into graphics system of the PS5 was pretty limited.

gmueckl · 33d ago

These systems are highly proprietary and Inam reasonably certain that stating anything about them publicly would break some NDAs.

pjmlp · 33d ago

As someone still having an Nintendo Developer Portal account, holding SCEE content back when the London Soho office used to have a developer site (aka Team Soho), and PS2Linux owner, there is plenty of material that can be discussed publicly without breaking NDAs.

flohofwoe · 33d ago

Console specific information also is not all that interesting anymore these days since game consoles have switched to off-the-shelf GPU designs with only minor modifications.

genocidicbunny · 33d ago

Even the current generation of consoles still have some interesting stuff going on. The 'core' of the console is fairly off the shelf, but they do still have modifications specific to the console that you won't find elsewhere. As far as GPU stuff goes, they tend to provide somewhat lower-level access to the hardware that you would normally not get with consumer stuff.

pjmlp · 33d ago

Yeah, but they still use their own proprietary APIs.

riggsdk · 33d ago

BGFX (https://github.com/bkaradzic/bgfx) uses a different approach. You basically write your shader in a GLSL-like language but it's all just (either very clever or very horrible) macro expansions that handles all the platform differences. With that you get a very performant backend for OpenGL, WebGL, Vulkan, Metal, Direct3D 11 and 12 and Playstation. It's shader compiler program basically does minimal source level transformations before handing it over to the platforms' own shader compiler if available.

shmerl · 33d ago

> Rendering, by comparison, is a huge can of worms. Every platform has their own unique support matrix.

GPU APIs landscape is stuck in the dumb NIH mentality of the '90s because stuck up lock-in proponents refuse to support Vulkan on their walled gardens.

The only recent somewhat positive development about that was MS deciding to use SPIR-V and eventually ditch DXIL. A small step in the right direction, but not really enough yet.

alexk101 · 33d ago

I have been playing around with slang, which is supposed to be more cross platform. They have a neural rendering slant, and I have yet to fully test on all platforms, but I think it's a welcome move to consolidate all these apis. https://shader-slang.org/

socalgal2 · 33d ago

https://compute.toys/view/1948 recently added Slang support

No comments yet

raphlinus · 33d ago

Yup, I think slang is the future. Anyone on this thread willing to fund a Rust implementation?

flohofwoe · 33d ago

AFAIK a very large part of Slang are massively big 3rd party libraries written in C++, the Slang-specific Rust code would just be a very thin layer on top of millions(?) of lines of C++ code that has been grown over decades and is maintained elsewhere.

(fwiw I've been considering to write the custom parts of the Sokol shader compiler in Zig instead of C++, but that's just a couple of thousand lines of glue code on top of massive C++ libraries (SPIRVTools, SPIRVCross, glslang and Tint), and those C++ APIs are terrible to work with from non-C++ languages.

As far as developer friction for integration into asset workflows goes, that's exactly where I would prefer Zig over Rust (but a simple build.zig already goes most of the way without porting any code to Zig).

pjmlp · 33d ago

It is according to Khronos anyway, for those that aren't already deeply invested into HLSL.

Khronos has been quite vocal that there is no further development on GLSL, they see that as a community effort, they only provide SPIR-V.

This is how vendor specific tooling eventually wins out, they kind of got lucky AMD decided to offer Mantle as basis for Vulkan, LunarG is doing the SDK, now NVidia contributed slang, otherwise they would still be arguing about OpenGL vNext.

gmueckl · 33d ago

Why? Unless you need to embed the slang transpiler (unlikely), the language it is written in literally doesn't matter.

raphlinus · 33d ago

This is a longer and deeper conversation, but I think on topic for the original article, so I'll go into it a bit. The tl;dr is developer friction.

By all means if you're doing a game (or another app with similar build requirements), figure out a shader precompilation pipeline so you're able to compile down to the lowest portable IR for each target, and ship that in your app bundle. Slang is meant for that, and this pipeline will almost certainly contain other tools written in C++ or even without source available (DXC, the Apple shader compiler tools, etc).

There are two main use cases where we want different pieces of shaders to come from different sources of truth, and link them together downstream. One is integrating samplers for (vello_hybrid) sparse strip textures so those can be combined with user paint sources in the user's 2D or 3D app. The other is that we're trying to make the renderer more modular so we have separate libraries for color space conversion and image filters (blur etc). To get maximal performance, you don't want to write out the blur result to a full-resolution texture, but rather have a function that can sample from an intermediate result. See [1] for more context and discussion of that point.

Stitching together these separate pieces of shader is a major potential source of developer friction. There is a happy path in the Rust ecosystem, albeit with some compromises, which is to fully embrace WGSL as the source of truth. The pieces can be combined with string-pasting, though we're looking at WESL as a more systematic approach. With WGSL, you can either do all your shader compilation at runtime (using wgpu for native), or do a build.rs script invoking naga to precompile. See [2] for the main PR that implements the latter in vello_hybrid. In the former case, you can even have hot reloading of shaders; implemented in Vello main but not (yet) vello_hybrid.

To get the same quality of developer experience with Slang, you'd need an implementation in Rust. I think this would be a good thing for Slang.

I've consistently underestimated the importance of developer friction in the past. As a contrast, we're also doing a CPU-only version of Vello now, and it's absolutely night and day, both for development velocity and attracting users. I think it's possible the GPU world gets better, but at the moment it's quite painful. I personally believe doing a Rust implementation of the Slang compiler would be an important step in the right direction, and is worth funding. Whether the rest of the world agrees with me, we'll see.

[1]: https://xi.zulipchat.com/#narrow/channel/197075-vello/topic/...

[2]: https://github.com/linebender/vello/pull/1011

coffeeaddict1 · 33d ago

> The pieces can be combined with string-pasting, though we're looking at WESL as a more systematic approach.

> To get the same quality of developer experience with Slang, you'd need an implementation in Rust. I think this would be a good thing for Slang.

WESL has the opposite problem: it doesn't have a C++ implementation. IMO, the graphics world will largely remain C++ friendly for the forseeable future, so if an effort like WESL wants to succeed, they will need to provide a C++ implementation (even more so than the need for Slang to provide a Rust one).

raphlinus · 33d ago

You're probably right about this. In the short to medium term, I expect that the Rust and C++ sub-ecosystems will be making different sets of choices. I don't know of any major C++ game or game-adjacent project adopting, say, Dawn for their RHI (render hardware interface) to buy into WebGPU. In the longer term, I expect the ecosystems to start blending together more, especially as C++/Rust interop improves (it's pretty janky now).

gmueckl · 33d ago

Long story short: you want to compose shaders at runtime and need a compilation pipeline for that. So what you really need is a C interface to the slang transpiler that is callable from rust.

Rewriting the whole slang pipeline in rust is a fool's errand.

shmerl · 33d ago

I think Rust itself as a language for GPU programming is another interesting alternative:

https://github.com/Rust-GPU/rust-gpu/

flohofwoe · 33d ago

I would agree if Vulkan would be a good 3D API, but it turned out to be the worst of the modern 3D APIs because it repeats the same main mistake as GL:

There is no design vision, instead it's a cobbled together mess of adhoc vendor extensions that are eventually promoted to 'core'.

socalgal2 · 33d ago

Vulkan is by far the worst API of all the modern graphics API. It's crap. It's also not even trying to be portable, expecting you to query a million things and then adapt to the platform.

I for one am glad Vulkan is not "it"

Also, it's not even portable on Windows. Sure, if you're making a game, you can expect your gamer audiences have it installed. If you're making an app and you expect businesses to use you it you'll find neither OpenGL nor Vulkan work on most business class machines view Remote Desktop. The point being, it's not portable by design nor in actuality

shmerl · 33d ago

No one else developed anything that's better to propose it as a common API. So it's Vulkan or nothing. Let's see those stuck up lock-in proponents who excuse not supporting it using this logic proposing something else. They didn't? So they can get lost and start supporting Vulkan first.

viraptor · 33d ago

Why would remote desktop make a difference for the API? I thought MS killed RemoteFX without any replacement? As in, everyone including DirectX is in the same bad state...

pjmlp · 33d ago

DirectX works just fine in RDP.

How do you think those Azure VMs for game developers work?

https://learn.microsoft.com/en-us/azure/virtual-desktop/grap...

viraptor · 33d ago

Sure it works. But the message before suggested that OpenGL and Vulkan don't for some reason. As far as I know they all work though and have worked for years now.

pjmlp · 33d ago

Probably because the drivers have to support it as well, I guess.

https://www.khronos.org/news/permalink/nvidia-provides-openg...

Also on desktop side, Windows on ARM, and UWP don't have that great support for anything beyond DirectX.

There is a compatibility pack and another one Angle based, but they haven't been updated in a while.

mvdtnz · 33d ago

This is a very interesting article, it was fun to learn what shaders are and why I have to wait for them to compile every time I play Call of Duty.

What I'd like to know is why game developers can't provide a service / daemon that will perform shader compilation when shaders or drivers change so I don't need to waste 25 minutes of my precious gaming time when I load CoD and see they need compiling. My computer has been sitting there unused all week. Steam was smart enough to download game updates in that time, why isn't your game smart enough to recompile my shaders on an idle machine?

shaggie76 · 33d ago

One alternative that many games choose is to do it on-demand which is felt as micro-stutters while you play but this is a poor choice for a competitive game like CoD.

We take a slightly different approach: we don't do any up-front in the launcher but do as much as possible on the level loading screen; it's not perfect though: due to the way some legacy code works we can't always anticipate all permutations ahead of time so you get the occasional micro-stutter as missing shaders are sent to the driver.

You can get away with being lazier with modern drivers because they will cache the compiled result for you (if you don't cache the pipeline state object yourself in DirectX 12) but on older DirectX drivers for Intel IGPs there wasn't a cache at all so the first 30 seconds after loading into a level would be very busy.

flohofwoe · 33d ago

A better approach is to drastically reduce the number of individual shaders. Having tens of thousands of shader variants is an unfortunate side effect of treating shaders as asset data, and building them with visual noodle-graph tools by artists. No game actually needs tens of thousands of shaders.

tsukikage · 33d ago

GPUs suck at things like e.g. data driven branches. What looks like one shader at a high level ends up creating many separate compiled blobs, because you really want some of the parameters baked in at compile time to avoid the performance tanking, and this means you need to compile a version of the shader for every combination of values those parameters can take.

tubs · 33d ago

Uniform branches are free pretty much

The main issue is that gpr allocation is static and worse case. So on the majority of hardware you hose your occupancy.

viraptor · 33d ago

Sometimes it pays off to have an ubershader with all the options though https://dolphin-emu.org/blog/2017/07/30/ubershaders/

Negitivefrags · 33d ago

Steam actually shares compiled shaders between users with the same hardware / driver version, but only for Vulkan I believe.

ThatPlayer · 33d ago

The shared compiled shaders are only for the Steam Deck I believe. Other hardware will require a background compile or will compile at launch.

Firehawke · 33d ago

Nope, it's been available in the Steam client beta since 2016, and somewhere around 2017 it went into the mainline client.

ThatPlayer · 32d ago

Yes, it'll share a shader cache that isn't precompiled is what I mean. As opposed to the Steam Deck which shares precompiled shader caches.

gmueckl · 33d ago

Imagine the outcry if every game came with a background process that just sits there in the background.

Also - and this may sound a little bonkers - some renderers are so complex and flexible that the actual set of required shaders is only discovered when it tries to render a frame and the full set of possible shader configuration permutations is so large that compiling them all in advance is pointless or borderline infeasible.

mvdtnz · 33d ago

Lots of games come with background processes.

gmueckl · 33d ago

Can you name an example? I have a large library of games and can't name one that does.

pornel · 33d ago

I really like the WebGPU API. That's the API where the major players, including Apple and Microsoft, are forced to collaborate. It has real-world implementations on all major platforms.

With the wgpu and Google dawn implementations, the API isn't actually tied to the Web, and can be used in native applications.

pjmlp · 33d ago

The only reason I like WebGL and WebGPU is that they are the only 3D APIs where major players take managed language runtimes into consideration, because they can't do otherwise.

Had WebAssembly already been there without being forced to go through JavaScript for Web APIs, and most likely they would be C APIs where everyone and their dog are writing bindings insteads.

Now, it is pretty much a ChromeOS only API still, and only available across macOS, Android and Windows.

Safari and Firefox have it as preview, and who knows when it will ever it stable at a scale that doesn't require "Works best on Chrome" banners.

Support on GNU/Linux, even from Chrome, is pretty much not there, at least for something to use in production.

And then we have the whole drama that after 15 years, there are still no usable developer tools on browsers for 3D debugging, one is forced to either guess what rendering calls are from the browser or which are from the application, GPU printf debugging, or having a native version that can be plugged into Renderdoc or similar.

flohofwoe · 33d ago

> Microsoft

With the switch to Chromium as browser engine, I don't think that Microsoft has any people working on WebGPU.

pjmlp · 32d ago

They do, if you look at https://github.com/gpuweb/gpuweb/wiki/Implementation-Status they do attend meetings.

voidUpdate · 33d ago

If you need to compile shaders at runtime, then why do I have to wait 10 mins for unreal to compile 30,000 shaders whenever it feels like it?

flohofwoe · 33d ago

Because the input bytecode blobs that are passed into 3d APIs are only a gpu-vendor- and render-pipeline-agnostic bytecode format (eg SPIRV or DXBC) - this is not what's actually running on the GPU. There's a second compile step happening inside the driver which compiles into a proprietary "machine code" format, and specialized for a specific pipeline object. Normally that driver internal compile step is quite fast, but that doesn't matter if there's tens of thousands of shader variants.

Weave (YC W25) is hiring a founding AI engineer (ycombinator.com)

Qfex (YC X25) – Back End Engineer for a 24/7 Stock Exchange (ycombinator.com)

Flowspace (YC S17) Is Hiring Software Engineers (flowspace.applytojob.com)

Attimet (YC F24) – Quant Trading Research Lab – Is Hiring Founding Engineer (ycombinator.com)

Jiga (YC W21) Is Hiring Software Engs to Make Life of Mech Engs Easier (workatastartup.com)

Foundry (YC F24) Hiring Early Engineer to Build Web Agent Infrastructure (ycombinator.com)

Blaze (YC S24) Is Hiring (ycombinator.com)

Infracost (YC W21) is hiring software engineers (GMT+2 to GMT-6) (infracost.io)

Solidroad (YC W25) Is Hiring (solidroad.com)

Kyber (YC W23) Is Hiring a Technical Account Manager (ycombinator.com)

Roundtable (YC S23) Is Hiring a President / CRO (ycombinator.com)

Roame (YC S23) Is Hiring (ycombinator.com)

GauntletAI (YC S17): All expenses paid AI training and guaranteed $200k+ job (gauntletai.com)

SchemeFlow (YC S24) Is Hiring an Engineer (London) to Speed Up Construction (ycombinator.com)

Shaped (YC W22) Is Hiring (ycombinator.com)

Spice Data (YC S19) is hiring a software engineer – back end (ycombinator.com)

Onlook (YC W25) Is Hiring an engineer in SF

OneText (YC W23) Is Hiring a DevOps/DBA Lead Engineer (jobs.ashbyhq.com)

Gander (YC F24) Is Hiring Founding Engineers and Interns (ycombinator.com)

Ziina (YC W21) the Series A fintech is hiring product engineers (ziina.notion.site)

Onyx (YC W24) – AI Assistants for Work Hiring Founding AE (ycombinator.com)

Great Question (YC W21) Is Hiring a Director of Customer Success (ycombinator.com)

Deepnote (YC S19) is hiring engineers to build an AI-powered data notebook (deepnote.com)

Converge (YC S23) Well-capitalized New York startup seeks product developers (runconverge.com)

CircuitHub (YC W12) is hiring full-stack robotics engineers (workatastartup.com)

AtoB (YC S20) – Stripe for Transportation – is hiring engineers (jobs.ashbyhq.com)

PromptArmor (YC W24) Is Hiring in San Francisco (ycombinator.com)

Depot (YC W23) is hiring an enterprise support engineer (UK/EU) (ycombinator.com)

Patched (YC S24) Is Hiring SWEs in Singapore (ycombinator.com)

Activeloop(YC S18)Is Hiring Senior Backend and AI Search Engineer(Mountain View) (careers.activeloop.ai)

Morph (YC S23) Is Hiring a ML Engineer

Layers All the Way Down: The Untold Story of Shader Compilation

Comments (48)