Future of Work with AI Agents (futureofwork.saltlab.stanford.edu)
1 points by nnx 38m ago 0 comments
Augmented Vertex Block Descent (AVBD) (graphics.cs.utah.edu)
1 points by bobajeff 38m ago 0 comments
Stealth Flying Wing Emerges at Secretive Chinese Base (twz.com)
5 points by Alupis 1h ago 0 comments
GPU-Driven Clustered Forward Renderer
116 logdahl 28 5/20/2025, 3:57:01 PM logdahl.net ↗
I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.
Should really dive deeper and update the measurements for final results...
But also, really good work! You should be proud of this! Squeezing that much out of that hardware is no easy feat.
Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.
You can try to come up with imposters representing these far away dragons, or simple LoD levels. Some games do use particles to represent far away and repeated "meshes" (Ghost of Tsushima does these for soldiers far away).
Lot's of techniques in this area ranging from simple to bananas. LoD levels alone can get you pretty far! Of course, this is at the cost of having more different draw calls, so it is a balancing game.
Think about the topology too, hope these old gems helps getting a grasp on the cost of this:
https://www.humus.name/index.php?page=Comments&ID=228
https://www.g-truc.net/post-0662.html
The basic idea is to first render as normal some meshes that you either know are visible, or are likely to occlude objects in the scene (say the N closest objects, or some large terrain feature in a real game). Then you can take the resulting depth buffer and downsample it into something resembling a mipmap chain, but with each level holding the max depth of the contributing pixels, rather than the average. This is called a hierarchical Z (depth) buffer, or HZB for short. This can be used to very quickly, with just a few samples of the HZB, test if an object's bounding box is behind the all the pixels in a given area and thus definitely not visible. The hierarchical nature of the HZB allows both small and large meshes to be tested at the same performance cost.
Typically, a game would track which meshlets were known to be visible last frame, and start by rendering all of those (with updated positions and camera orientation, of course). This will make up most of what is drawn to the scene, because typically objects and the camera change very little from frame to frame. Then all the meshlets that weren't known to be visible get tested against the HZB, and just the few that were revealed by changes in the scene will need to be rendered. Lastly, at some point the known visible meshlet set should be re-tested, so that it does not grow indefinitely with meshlets that are no longer visible.
The result is that the first frame rendered after a major camera change (like the player respawning) will be slow, as all the meshlets in the frustum need to be rendered. But after that, the scene can be narrowed down to just the meshlets that actually contributed to the frame, and performance improves significantly. I think this would be more than enough for a demo, but for a real game you would probably want to explore methods to speed up that first frame's rendering, like sorting objects and picking the N closest/largest ones so you can at least get some occlusion culling working.
I'm not sure what this part is supposed to say, but it doesn't look right. "Instead" usually follows differences, not similarities.
Where’s that from?
Also note C++14 introduced the apostrophe in numeric literals! https://en.cppreference.com/w/cpp/language/integer_literal
https://en.wikipedia.org/wiki/Integer_literal#Digit_separato...
And I would never have known this existed without hackernews