A lot of the performance has come from compilers using new instruction sets and optimisations for the new architectures. Programs and libraries are now designed around serial memory access more and multicore and awareness of the heterogenous multicore performance. If you ignore hyperthreading and E core verses P core then it will harm performance. Anything on a server needs to be aware of NUMA and memory locality to avoid risking regression in performance.
As this video shows there are examples of code that went backwards in aspects because they are highly branchy and random access in nature and this harms the pipeline predictions. Performance has got increasingly complex and multifaceted and getting more out of today’s machines requires more knowledge of what the CPU likes because they just stopped getting quicker in basic single threaded functions that were unoptimised for the hardware about 20 years ago. The clockspeed stopped climbing rapidly and along with it single threaded performance increase slowed to instruction optimisation which doesn't affect each programme the same way.
As this video shows there are examples of code that went backwards in aspects because they are highly branchy and random access in nature and this harms the pipeline predictions. Performance has got increasingly complex and multifaceted and getting more out of today’s machines requires more knowledge of what the CPU likes because they just stopped getting quicker in basic single threaded functions that were unoptimised for the hardware about 20 years ago. The clockspeed stopped climbing rapidly and along with it single threaded performance increase slowed to instruction optimisation which doesn't affect each programme the same way.