This is awesome! At the end you mention the 27k dragons and 10k lights just barely fits in 16ms. Do you see any paths to improve performance? I've seen some demos on with tens/hundreds of thousands of moving lights, but hard to tell if they're legit or highly constrained. I'm not a graphics programmer by trade.
I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.
logdahl · 7h ago
Well, the core issue is still drawing. I took another look at some profiles again and seems like its not the renderer limiting this to 27k! I still had some stupid scene-graph traversal... But clustering and culling is 53us and 33us respectively, but the draw is 7ms. So a frame (on the GPU-side) is like 7ms, and some 100-200 us on the CPU side.
Should really dive deeper and update the measurements for final results...
godelski · 1m ago
I haven't look at the post in the detail it deserves, but given your graphs the workload looks pretty bursty. I'd suspect there are some good I/O optimizations or some predication. Definitely that last void main block looks ripe for that. But I'd listen to Knuth, premature optimization and all, so grab a profiler. I wouldn't be surprised if you're nearing peak performance. Also NVIDIA GPUs have a lot of special tricks that can be exploited but are buried in documentation... if you haven't already seen it (I suspect you have), you'd be interested in "GPU Gems"
But also, really good work! You should be proud of this!
gmueckl · 6h ago
This seems fairly well optimized. There's probably room to squeeze out some more perf, but not dramatic improvements. Maybe preventing overdraw of shaded pixels by doing a depth prepass would help.
Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.
spookie · 3h ago
This is by far the biggest culprit OP, look into this.
You can try to come up with imposters representing these far away dragons, or simple LoD levels. Some games do use particles to represent far away and repeated "meshes" (Ghost of Tsushima does these for soldiers far away).
Lot's of techniques in this area ranging from simple to bananas. LoD levels alone can get you pretty far! Of course, this is at the cost of having more different draw calls, so it is a balancing game.
Think about the topology too, hope these old gems helps getting a grasp on the cost of this:
Yeah, I use LODs already but as you say, even my lowest lod far away is too many vertices. Imposter rendering seems very interesting but also completely bonkers (viewing angle, lighting)!
zokier · 5h ago
Worth noting that the GTX 1070 is nearly 10 year old "mainstream" GPU. I'd imagine a 5090 or something could push the numbers fair bit more higher.
rezmason · 6h ago
Ten thousand lights! Your utility bill must be enormous
Flex247A · 6h ago
Lights in games use real electricity :)
amelius · 5h ago
Even the stars use real electricity.
cluckindan · 2h ago
Not really, nuclear fusion doesn’t run on electrons.
Interesting that Sweden explicitly do NOT use it... Not sure where i picked it up! :-)
lacoolj · 6h ago
Learn somethin new every day.
And I would never have known this existed without hackernews
m-schuetz · 53m ago
Apostrophe are nice because they are not ambiguous. Started using them myself after getting used to them from C++ and learning that they are used in switzerland.
I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.
Should really dive deeper and update the measurements for final results...
But also, really good work! You should be proud of this!
Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.
You can try to come up with imposters representing these far away dragons, or simple LoD levels. Some games do use particles to represent far away and repeated "meshes" (Ghost of Tsushima does these for soldiers far away).
Lot's of techniques in this area ranging from simple to bananas. LoD levels alone can get you pretty far! Of course, this is at the cost of having more different draw calls, so it is a balancing game.
Think about the topology too, hope these old gems helps getting a grasp on the cost of this:
https://www.humus.name/index.php?page=Comments&ID=228
https://www.g-truc.net/post-0662.html
Where’s that from?
Also note C++14 introduced the apostrophe in numeric literals! https://en.cppreference.com/w/cpp/language/integer_literal
https://en.wikipedia.org/wiki/Integer_literal#Digit_separato...
And I would never have known this existed without hackernews