Ask HN: Help with doing statistics over machine code

2 phafu 2 5/12/2025, 8:45:47 AM
I'd like to do some statistics over the machine code gcc generates, such as a histogram of used instructions, average volatile/preserved registers usage of functions etc. For now just x86_64 SysV-ABI would be enough.

However I'm not aware of any pre-existing tool that lets me easily do this. The options I currently see are either make gcc output assembly and write a parser for the GNU Assembler format (possibly by reusing the asm-parser of the compiler-explorer project), or write a tool that reads (disassembles) object files directly using elfutils.

Any hints, prior work, further ideas, links to useful resources, or any other kind of help would be much appreciated.

Comments (2)

baobun · 5h ago
"Static analysis" should be a relevant search term. Assuming you don't need to tie the instructions back to C code then the "gcc" part seems circumstancial for implementation? I guess you might want to parse the ASM into an Abstract Syntax Tree (AST) represenation and work on that?

If you do want to tie it back to the source, this looks relevant: http://icps.u-strasbg.fr/~pop/gcc-ast.html

phafu · 5h ago
For my purpose I don't need to get back to the original source, no.

The gcc part is only relevant with regards to what dialect of assembler I need to parse. If I go that route, I'd write a parser for the GNU assembler, and that would of course work with any code in that dialect, regardless from which compiler it came from (I haven't checked whether other compilers can produce GNU assembler though).