OP here. The post is about a method I derived as a part of my work on a Java decompiler. I also have related questions for people in programming language theory. ("Ask HN" doesn't seem to fit submissions with URLs, so I'm posting it here.)
It's frustrating how few online resources cover efficient (de)compiler designs. There's many people doing cheap, but low-quality decompilation (e.g. here's someone's post on writing a .NET decompiler: https://news.ycombinator.com/item?id=9952145), and there's even more people writing stupid non-optimizing compilers. There's also quite a few folks improving LLVM or Ghidra, or writing new, but heavy theorem prover-based decompilers (e.g. DREAM: https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan...). But there's frustratingly little information on the middle ground.
Also, and I think this ties to the previous point, there's next to zero entry-level resources on hard topics. It's like there's an invisible wall: tons of people learn tokenization, parsing, codegen, and then they stop because their compiler already "works". Resources that do focus on optimization are haphazard, either exclusively describing peephole optimizations, or explaining how higher-level optimizations can work in theory without any mention of how to implement them efficiently and fit them together in a LLVM-like backend framework.
There's literally no specific info on pass ordering and pass design, as if it's something intuitive. But the more I'm working on the decompiler, the more I realize that I have to cram basically everything into a single pass if I want to avoid the dreaded `do { .. } while (changed);` loop and arbitrary heuristics. But not all passes can be merged, obviously, so I have to re-architecture everything all the time.
Regarding the method in the post, I could easily be reinventing a wheel and not knowing it.
So I'm very interested in hearing if this method is used anywhere else, whether CFGs can be efficiently applied to decompilers without encountering the problems described described in my post, some advice on pass design and ordering, and maybe information about specific algorithms compilers and decompilers use (I know about dominators, obviously, but I've never heard about anything else; I can't imagine there aren't any).
I also learn best by doing and following experiences, rather than copying the finished product, so if anyone knows a blog series on a person developing a full optimizing compiler from scratch or something similar, that would be very useful.
It's frustrating how few online resources cover efficient (de)compiler designs. There's many people doing cheap, but low-quality decompilation (e.g. here's someone's post on writing a .NET decompiler: https://news.ycombinator.com/item?id=9952145), and there's even more people writing stupid non-optimizing compilers. There's also quite a few folks improving LLVM or Ghidra, or writing new, but heavy theorem prover-based decompilers (e.g. DREAM: https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan...). But there's frustratingly little information on the middle ground.
Also, and I think this ties to the previous point, there's next to zero entry-level resources on hard topics. It's like there's an invisible wall: tons of people learn tokenization, parsing, codegen, and then they stop because their compiler already "works". Resources that do focus on optimization are haphazard, either exclusively describing peephole optimizations, or explaining how higher-level optimizations can work in theory without any mention of how to implement them efficiently and fit them together in a LLVM-like backend framework.
There's literally no specific info on pass ordering and pass design, as if it's something intuitive. But the more I'm working on the decompiler, the more I realize that I have to cram basically everything into a single pass if I want to avoid the dreaded `do { .. } while (changed);` loop and arbitrary heuristics. But not all passes can be merged, obviously, so I have to re-architecture everything all the time.
Regarding the method in the post, I could easily be reinventing a wheel and not knowing it.
So I'm very interested in hearing if this method is used anywhere else, whether CFGs can be efficiently applied to decompilers without encountering the problems described described in my post, some advice on pass design and ordering, and maybe information about specific algorithms compilers and decompilers use (I know about dominators, obviously, but I've never heard about anything else; I can't imagine there aren't any).
I also learn best by doing and following experiences, rather than copying the finished product, so if anyone knows a blog series on a person developing a full optimizing compiler from scratch or something similar, that would be very useful.