Finally, a processor manufacturer defects from the obfuscatory equilibrium. Granted, Apple’s processor people are not saints—I’ve yet to see even a full table of throughputs, latencies, and port loads from them, let alone an accurate CPU model—but I welcome anything that might maybe, hopefully, pretty please start a race of giving more accurate data to people doing low-level optimization.
touisteur · 13m ago
Intel Processor Trace was already pretty great. Built a MC-DC coverage tool with it. Used it for fine profiling, live program monitoring...
bri3d · 28m ago
What’s your beef with VTune and uProf?
jauntywundrkind · 28m ago
Longer term I sort of dream of doing computing from the inside out, using all this tracing data we've started gathering not just for observability but as a log and engine of compute: the record of what computing has been done as an event-source, for an event sourcing computing architecture.
ip26 · 1m ago
The present opportunity, in my view, is to feed this tracing into the development of superior compilers. This is starting to happen with automated profiling by the compiler, but you can imagine the profiling expanding to an enormous degree, with the compiler tracing the program it is building in great detail.
do_not_redeem · 1h ago
> Instead of statistical sampling like most profilers, you get a complete picture of your app’s execution flow.
Potentially interesting, but it's not really clear whether this is anything new or not. valgrind + kcachegrind does this too.
These screenshots look a lot like kcachegrind with a slightly reimagined UI. Is there actually anything new here, or is this another case of Apple finally catching up to the open source world?
nkurz · 35m ago
As 'GeekyBear' implies in a sibling comment, valgrind works with an emulation of an ideal processor rather than directly on the actual CPU. Sometimes this gives you a good idea of how the program will actually run, and sometimes it doesn't. As processors became more complex, it got farther and farther from the truth. Personally, I started in the Valgrind era and stopped using it as soon as better tools using native instrumentation became available. If Apple's approach works as well as described, it is much better than anything from that era.
do_not_redeem · 24m ago
I've never found cachegrind inaccurate, but maybe I'm not doing hardcore enough performance work. You can also use perf and get you numbers straight from the hardware if that's what you need. Truth be told I mainly use cachegrind because I prefer kcachegrind's UI to hotspot.
(I even prefer cachegrind's approach since the numbers will be less distorted by other random background activity on the machine, but that could just be idealism on my part, who knows.)
If perf or the vendor-specific tools like vtune/uprof aren't sufficient for you then I'm curious what do you use?
GeekyBear · 50m ago
> Potentially interesting, but it's not really clear whether this is anything new or not. valgrind + kcachegrind does this too.
Looking at the kcachegrind homepage, it doesn't sound like they are pulling their data directly from the CPU core itself:
> Callgrind uses runtime instrumentation via the Valgrind framework for its cache simulation and call-graph generation.
Apple seems to have modified it's core design so that it will stream data to a log file while the code is running.
> Recent Apple silicon devices can capture a processor trace where the CPU stores information about the code it runs, including the branches it takes and the instructions it jumps to. The CPU streams this information to an area on the file system so that you can analyze it with the Processor Trace instrument.
jauntywundrkind · 35m ago
Intel has a Performance Monitoring Unit on its core that has significant overlap.
Forgetting this tool-space, but at least some of these tools can make use of that hardware:
If you need data straight from the hardware you can use e.g. perf+hotspot, although I've heard that perf's tracing (not sampling!) supports fewer CPUs (but still more than just 1)
urbandw311er · 57m ago
I feel like it probably would work on older hardware, this very much smacks of forced obsolescence. Just guessing though.
nozzlegear · 26m ago
Is forced obsolescence the right term for a somewhat obscure debug tool built for developers of macOS/iOS software? I don't imagine there are many people who would feel forced to upgrade their machines more quickly just to get access to this.
astrange · 51m ago
It would not. You could port cachegrind I suppose.
(Even if hardware support did exist earlier, you don't want to deal with errata for a new hardware feature. It's kind of amazing anything ever works.)
Potentially interesting, but it's not really clear whether this is anything new or not. valgrind + kcachegrind does this too.
https://developer.apple.com/documentation/xcode/analyzing-cp...
These screenshots look a lot like kcachegrind with a slightly reimagined UI. Is there actually anything new here, or is this another case of Apple finally catching up to the open source world?
(I even prefer cachegrind's approach since the numbers will be less distorted by other random background activity on the machine, but that could just be idealism on my part, who knows.)
If perf or the vendor-specific tools like vtune/uprof aren't sufficient for you then I'm curious what do you use?
Looking at the kcachegrind homepage, it doesn't sound like they are pulling their data directly from the CPU core itself:
> Callgrind uses runtime instrumentation via the Valgrind framework for its cache simulation and call-graph generation.
https://kcachegrind.github.io/html/Home.html
Apple seems to have modified it's core design so that it will stream data to a log file while the code is running.
> Recent Apple silicon devices can capture a processor trace where the CPU stores information about the code it runs, including the branches it takes and the instructions it jumps to. The CPU streams this information to an area on the file system so that you can analyze it with the Processor Trace instrument.
Forgetting this tool-space, but at least some of these tools can make use of that hardware:
https://github.com/intel/pcm https://github.com/andikleen/pmu-tools
(Even if hardware support did exist earlier, you don't want to deal with errata for a new hardware feature. It's kind of amazing anything ever works.)