I think it is a totally different type of table. Yours is real data. Theirs is more like a ballpark. Maybe there could be some use for the latter? Just to help folks reason about performance.
Although, reasoning about performance can be hard anyway.
Liftyee · 1h ago
I agree with this. As someone who's not an expert in assembly and CPU architecture the "simplified" estimates in a condensed log-chart format was much more insightful. The exact data for specific architectures would be useful for more advanced users than me, but it doesn't offer the same quick "big picture" overview.
bee_rider · 1h ago
Did you get a chance to use it? I’ve only just come across this table now, so I haven’t bad a chance to actually try and use it for anything, so I wouldn’t be able to evaluate the usefulness.
I have a sneaking suspicion that this table is satisfying for our brains as a vaguely technical and interesting thing, but I’m not sure how useful it really is. In general the compiler will be really creative in reordering instructions, and the CPU will also be creative about which ones it runs parallel (since it is good at discovering instruction level parallelism). So, I wonder if the level of study necessary to use this information also requires the level of data that is available in the detailed table.
I have not done much caring about instructions, it seems very hard. FWIW I have had some success caring about reducing the number of trips to memory and making sure the dependencies are obvious to the computer, so I’m not totally naive… but I think that caring about instruction timing is mostly for the real hardcore optimization badasses.
If you CPU runs on 1000MHz that's 10^9 cycles per second. On that CPU the right hand side of the picture corresponds to 1ms.
Computers are fast.
https://uops.info/table.html
supports most modern and old architectures
Although, reasoning about performance can be hard anyway.
I have a sneaking suspicion that this table is satisfying for our brains as a vaguely technical and interesting thing, but I’m not sure how useful it really is. In general the compiler will be really creative in reordering instructions, and the CPU will also be creative about which ones it runs parallel (since it is good at discovering instruction level parallelism). So, I wonder if the level of study necessary to use this information also requires the level of data that is available in the detailed table.
I have not done much caring about instructions, it seems very hard. FWIW I have had some success caring about reducing the number of trips to memory and making sure the dependencies are obvious to the computer, so I’m not totally naive… but I think that caring about instruction timing is mostly for the real hardcore optimization badasses.