Driving Compilers (2023)

49 misonic 9 5/5/2025, 2:17:12 AM fabiensanglard.net ↗

Comments (9)

lynx97 · 5h ago
Nitpick: Almost all Hello World C examples are wrong. printf is for when you need to use a format string. Hello World doesn't. Besides:

> puts() writes the string s and a trailing newline to stdout.

int main() { puts("Hello World!"); }

indigoabstract · 12m ago
The example is kind of pedantic, but I think a linter might be able to catch it.
unwind · 4h ago
I agree, but I have to point out that if you're gonna be like that, then you should be explicit about your final

    return 0;
PhilipRoman · 5h ago
Eh, it compiles down to the same thing with optimizations enabled:

https://godbolt.org/z/zcqa4Txen

But I agree, using printf for constant strings is one step away from doing printf(x) which is a big no-no.

Joker_vD · 4h ago
Useless bit of compiler optimizations trivia: the "this printf() is equivalent to puts()" optimization seems to work by looking for the '%' in the format string, not by counting whether there is only one argument to printf(), e.g. if you add 42 as a second argument to the printf() — which is absolutely legal and required by the standard to Work as Intended™ — the resulting binary still uses puts().
Timwi · 4h ago
I share the frustration the author describes. When I started out programming as a child, I used Turbo Pascal, but I was aware of Turbo C and that more people used that than Pascal. Nevertheless, I couldn't really wrap my head around C at the time, and it was partly due to linker errors that I couldn't understand; and it seemed that Turbo Pascal just didn't use a linker, so it was easier to understand and tinker with at age 9.

It's intriguing to think how different my experience could have been if educational material at the time had focused as much on full explanations of the compiler+linker process, including example error conditions, as it did on teaching the language.

30 years later, I like to claim that I have a reasonably workable understanding of how compilers work, but I'm still nebulous on how linkers do what they do. I'm much more comfortable with higher-level compilers such as C# that compile to a VM bytecode (IL) and don't worry about linkers.

antonvs · 12m ago
C# and Java still do linking, it just happens dynamically at runtime. That’s part of why startup time is slower in those languages, and why performance can be less predictable.

The main difference between linkers for native binaries and linking in IL-based languages is that native binary linking involves resolving memory addresses at build time. In the object files that are being linked, memory addresses are typically 0-relative to whatever section they’re in within that file. When you combine a bunch of object files together, you have to adjust the addresses so they can live together in the same address space. Object file A and B both might use addresses 0-10, but when they’re linked together, the linker will arrange it so that e.g. A uses 0-10 and B uses 11-21. That’s just a bit of simple offset arithmetic. And if both reference the same non-local symbol, it will be arranged so that both refer to the same memory address.

The IL-based languages retain all the relevant symbol information at runtime, which allows for a lot of flexibility at the cost of some performance - i.e. runtime lookups. This is typically optimized by caching the address after the first lookup, or if JIT compilation is occurring, embedding the relocated addresses in generated code.

The linker UX issues you ran into were mostly a function of the state of the art at the time, though. Languages like Go and Rust do native linking nowadays in a way that users barely notice. IL-based languages had a better linking UX partly because they were forced to - linking problems at runtime do still occur, e.g. “class not found”, but if linking in general had been a common problem for users at runtime instead of developers at build time, those languages would have struggled to get adoption.

virgilp · 54m ago
Linkers pretty much map data sections to memory, and in doing so are able to replace symbolic names (like global variables, or goto targets) with numbers. They may also completely drop some things that are not needed (e.g. code/files in a library that is never referenced).

I'm over-simplifying and also it's a bit incorrect, because there's also the loader that does a lot of the same work that linkers do, when loading the program in memory. So linkers don't actually produce the final image - but really, they're rather "simple" things (for some definition of "simple").

The hard-to-understand linker errors are typically caused by the compiler, not the linker (it's the compiler that speculatively chooses to use a symbol with a long and funny name, thinking that it'll later be provided by <somebody>, when in fact the linker later finds out that no library or object file actually provided said symbol; and then for the linker to give you a decent error message, it needs to have a pretty good understanding of what the compiler was actually trying to do - i.e. to know implementation details of the compiler that otherwise would not concern it at all).

stef-13013 · 51m ago
Really nice, thanks !!