Apple's new Processor Trace instrument is incredible

47 xdevweeknds 14 8/16/2025, 8:42:28 PM victorwynne.com ↗

Comments (14)

mananaysiempre · 34m ago
Finally, a processor manufacturer defects from the obfuscatory equilibrium. Granted, Apple’s processor people are not saints—I’ve yet to see even a full table of throughputs, latencies, and port loads from them, let alone an accurate CPU model—but I welcome anything that might maybe, hopefully, pretty please start a race of giving more accurate data to people doing low-level optimization.
touisteur · 15m ago
Intel Processor Trace was already pretty great. Built a MC-DC coverage tool with it. Used it for fine profiling, live program monitoring...
bri3d · 30m ago
What’s your beef with VTune and uProf?
jauntywundrkind · 30m ago
Longer term I sort of dream of doing computing from the inside out, using all this tracing data we've started gathering not just for observability but as a log and engine of compute: the record of what computing has been done as an event-source, for an event sourcing computing architecture.
ip26 · 3m ago
The present opportunity, in my view, is to feed this tracing into the development of superior compilers. This is starting to happen with automated profiling by the compiler, but you can imagine the profiling expanding to an enormous degree, with the compiler tracing the program it is building in great detail.
do_not_redeem · 1h ago
> Instead of statistical sampling like most profilers, you get a complete picture of your app’s execution flow.

Potentially interesting, but it's not really clear whether this is anything new or not. valgrind + kcachegrind does this too.

https://developer.apple.com/documentation/xcode/analyzing-cp...

These screenshots look a lot like kcachegrind with a slightly reimagined UI. Is there actually anything new here, or is this another case of Apple finally catching up to the open source world?

nkurz · 37m ago
As 'GeekyBear' implies in a sibling comment, valgrind works with an emulation of an ideal processor rather than directly on the actual CPU. Sometimes this gives you a good idea of how the program will actually run, and sometimes it doesn't. As processors became more complex, it got farther and farther from the truth. Personally, I started in the Valgrind era and stopped using it as soon as better tools using native instrumentation became available. If Apple's approach works as well as described, it is much better than anything from that era.
do_not_redeem · 26m ago
I've never found cachegrind inaccurate, but maybe I'm not doing hardcore enough performance work. You can also use perf and get you numbers straight from the hardware if that's what you need. Truth be told I mainly use cachegrind because I prefer kcachegrind's UI to hotspot.

(I even prefer cachegrind's approach since the numbers will be less distorted by other random background activity on the machine, but that could just be idealism on my part, who knows.)

If perf or the vendor-specific tools like vtune/uprof aren't sufficient for you then I'm curious what do you use?

GeekyBear · 52m ago
> Potentially interesting, but it's not really clear whether this is anything new or not. valgrind + kcachegrind does this too.

Looking at the kcachegrind homepage, it doesn't sound like they are pulling their data directly from the CPU core itself:

> Callgrind uses runtime instrumentation via the Valgrind framework for its cache simulation and call-graph generation.

https://kcachegrind.github.io/html/Home.html

Apple seems to have modified it's core design so that it will stream data to a log file while the code is running.

> Recent Apple silicon devices can capture a processor trace where the CPU stores information about the code it runs, including the branches it takes and the instructions it jumps to. The CPU streams this information to an area on the file system so that you can analyze it with the Processor Trace instrument.

jauntywundrkind · 37m ago
Intel has a Performance Monitoring Unit on its core that has significant overlap.

Forgetting this tool-space, but at least some of these tools can make use of that hardware:

https://github.com/intel/pcm https://github.com/andikleen/pmu-tools

do_not_redeem · 39m ago
If you need data straight from the hardware you can use e.g. perf+hotspot, although I've heard that perf's tracing (not sampling!) supports fewer CPUs (but still more than just 1)
urbandw311er · 59m ago
I feel like it probably would work on older hardware, this very much smacks of forced obsolescence. Just guessing though.
nozzlegear · 28m ago
Is forced obsolescence the right term for a somewhat obscure debug tool built for developers of macOS/iOS software? I don't imagine there are many people who would feel forced to upgrade their machines more quickly just to get access to this.
astrange · 53m ago
It would not. You could port cachegrind I suppose.

(Even if hardware support did exist earlier, you don't want to deal with errata for a new hardware feature. It's kind of amazing anything ever works.)