Recovering control flow structures without CFGs

Comments (1)

purplesyringa · 13h ago

OP here. The post is about a method I derived as a part of my work on a Java decompiler. I also have related questions for people in programming language theory. ("Ask HN" doesn't seem to fit submissions with URLs, so I'm posting it here.)

It's frustrating how few online resources cover efficient (de)compiler designs. There's many people doing cheap, but low-quality decompilation (e.g. here's someone's post on writing a .NET decompiler: https://news.ycombinator.com/item?id=9952145), and there's even more people writing stupid non-optimizing compilers. There's also quite a few folks improving LLVM or Ghidra, or writing new, but heavy theorem prover-based decompilers (e.g. DREAM: https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan...). But there's frustratingly little information on the middle ground.

Also, and I think this ties to the previous point, there's next to zero entry-level resources on hard topics. It's like there's an invisible wall: tons of people learn tokenization, parsing, codegen, and then they stop because their compiler already "works". Resources that do focus on optimization are haphazard, either exclusively describing peephole optimizations, or explaining how higher-level optimizations can work in theory without any mention of how to implement them efficiently and fit them together in a LLVM-like backend framework.

There's literally no specific info on pass ordering and pass design, as if it's something intuitive. But the more I'm working on the decompiler, the more I realize that I have to cram basically everything into a single pass if I want to avoid the dreaded `do { .. } while (changed);` loop and arbitrary heuristics. But not all passes can be merged, obviously, so I have to re-architecture everything all the time.

Regarding the method in the post, I could easily be reinventing a wheel and not knowing it.

So I'm very interested in hearing if this method is used anywhere else, whether CFGs can be efficiently applied to decompilers without encountering the problems described described in my post, some advice on pass design and ordering, and maybe information about specific algorithms compilers and decompilers use (I know about dominators, obviously, but I've never heard about anything else; I can't imagine there aren't any).

I also learn best by doing and following experiences, rather than copying the finished product, so if anyone knows a blog series on a person developing a full optimizing compiler from scratch or something similar, that would be very useful.

My AI skeptic friends are all nuts (fly.io)

Self-hosting your own media considered harmful according to YouTube (jeffgeerling.com)

Photos taken inside musical instruments (dpreview.com)

OpenAI slams court order to save all ChatGPT logs, including deleted chats (arstechnica.com)

The time bomb in the tax code that's fueling mass tech layoffs (qz.com)

FFmpeg merges WebRTC support (git.ffmpeg.org)

Cloudlflare builds OAuth with Claude and publishes all the prompts (github.com)

If you are useful, it doesn't mean you are valued (betterthanrandom.substack.com)

Root shell on a credit card terminal (stefan-gloor.ch)

Square Theory (aaronson.org)

The Who Cares Era (dansinker.com)

IRS Direct File on GitHub (chrisgiven.com)

WeatherStar 4000+: Weather Channel Simulator (weatherstar.netbymatt.com)

The ‘white-collar bloodbath’ is all part of the AI hype machine (cnn.com)

Quarkdown: A modern Markdown-based typesetting system (github.com)

A proposal to restrict sites from accessing a users’ local network (github.com)

How to post when no one is reading (jeetmehta.com)

Human coders are still better than LLMs (antirez.com)

Merlin Bird ID (merlin.allaboutbirds.org)

Why I wrote the BEAM book (happihacking.com)

Cursor 1.0 (cursor.com)

Show HN: I rewrote my Mac Electron app in Rust (desktopdocs.com)

Deep learning gets the glory, deep fact checking gets ignored (rachel.fast.ai)

US Trade Court finds Trump tariffs illegal (bloomberg.com)

Progressive JSON (overreacted.io)

My website is ugly because I made it (goodinternetmagazine.com)

The impossible predicament of the death newts (crookedtimber.org)

Precision Clock Mk IV (mitxela.com)

FLUX.1 Kontext (bfl.ai)

Covert web-to-app tracking via localhost on Android (localmess.github.io)

Show HN: My LLM CLI tool can run tools now, from Python code or plugins (simonwillison.net)

The Visual World of 'Samurai Jack' (animationobsessive.substack.com)

Show HN: Kan.bn – An open-source alterative to Trello (github.com)

Tesla seeks to guard crash data from public disclosure (reuters.com)

Meta: Shut down your invasive AI Discover feed (mozillafoundation.org)

EU Commission refuses to disclose authors behind its mass surveillance proposal (old.reddit.com)

Japan Post launches 'digital address' system (japantimes.co.jp)

As a developer, my most important tools are a pen and a notebook (hamatti.org)

Show HN: Air Lab – A portable and open air quality measuring device (networkedartifacts.com)

Show HN: I wrote a modern Command Line Handbook (commandline.stribny.name)

How we decreased GitLab repo backup times from 48 hours to 41 minutes (about.gitlab.com)

Deepseek R1-0528 (huggingface.co)

Cinematography of “Andor” (pushing-pixels.org)

Prompt engineering playbook for programmers (addyo.substack.com)

The radix 2^51 trick (2017) (chosenplaintext.ca)

The Right to Repair Is Law in Washington State (eff.org)

Figma Slides Is a Beautiful Disaster (allenpike.com)

Gurus of 90s Web Design: Zeldman, Siegel, Nielsen (cybercultural.com)

Surprisingly fast AI-generated kernels we didn't mean to publish yet (crfm.stanford.edu)

Google restricts Android sideloading (puri.sm)

Recovering control flow structures without CFGs

Comments (1)