This is awesome! At the end you mention the 27k dragons and 10k lights just barely fits in 16ms. Do you see any paths to improve performance? I've seen some demos on with tens/hundreds of thousands of moving lights, but hard to tell if they're legit or highly constrained. I'm not a graphics programmer by trade.
I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.
logdahl · 2h ago
Well, the core issue is still drawing. I took another look at some profiles again and seems like its not the renderer limiting this to 27k! I still had some stupid scene-graph traversal... But clustering and culling is 53us and 33us respectively, but the draw is 7ms. So a frame (on the GPU-side) is like 7ms, and some 100-200 us on the CPU side.
Should really dive deeper and update the measurements for final results...
zokier · 1h ago
Worth noting that the GTX 1070 is nearly 10 year old "mainstream" GPU. I'd imagine a 5090 or something could push the numbers fair bit more higher.
gmueckl · 2h ago
This seems fairly well optimized. There's probably room to squeeze out some more perf, but not dramatic improvements. Maybe preventing overdraw of shaded pixels by doing a depth prepass would help.
Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.
rezmason · 2h ago
Ten thousand lights! Your utility bill must be enormous
I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.
Should really dive deeper and update the measurements for final results...
Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.
Where’s that from?
Also note C++14 introduced the apostrophe in numeric literals! https://en.cppreference.com/w/cpp/language/integer_literal
https://en.wikipedia.org/wiki/Integer_literal#Digit_separato...
And I would never have known this existed without hackernews