Driving Compilers (2023)

93 misonic 30 5/5/2025, 2:17:12 AM fabiensanglard.net ↗

Comments (30)

Timwi · 32d ago

I share the frustration the author describes. When I started out programming as a child, I used Turbo Pascal, but I was aware of Turbo C and that more people used that than Pascal. Nevertheless, I couldn't really wrap my head around C at the time, and it was partly due to linker errors that I couldn't understand; and it seemed that Turbo Pascal just didn't use a linker, so it was easier to understand and tinker with at age 9.

It's intriguing to think how different my experience could have been if educational material at the time had focused as much on full explanations of the compiler+linker process, including example error conditions, as it did on teaching the language.

30 years later, I like to claim that I have a reasonably workable understanding of how compilers work, but I'm still nebulous on how linkers do what they do. I'm much more comfortable with higher-level compilers such as C# that compile to a VM bytecode (IL) and don't worry about linkers.

dragontamer · 32d ago

Linkers become abundantly clear when you write an OS from scratch.

Hardware has very peculiar rules for how it loads. The old floppy bootloader would only load the first sector (512 bytes), and after that it's that 512-byte code blocks job to finish loading the code and running it (often called the 2nd stage bootloader).

So writing this makes it super obvious what linkers do. At first you hardcode everything to set addresses. But then a function grows and no longer fits.

So now you have functions + their lengths, as well as a few holes for where your global variables go

And then different .c files may want different global (or static) variables. So now you need to somehow add the lengths of all data segments across all your .c files together.

And then suddenly you understand Linkers, and just use LD / Elf files.

--------

It's a bit of a trial by fire. But not really??? There are super simple computers out there called MicroControllers with just 200 page manuals describing everything.

Writing a bootloader for some simple Atmel AVR chip is perfect for this learning experience. ATMega328p is the classic but there are better more modern chips.

But ATMega328p was popular 15 years ago and still is manufactured in large numbers today

raddan · 32d ago

I wrote a bootloader for an iPod Mini when I was an undergrad, and honestly, I don’t think that would have helped me understand linking the first time around. With 20 years of hindsight and lots more hacking experience I can see the connection, but it’s not an obvious one.

dragontamer · 32d ago

Writing a bootloader in one file is easy enough and will avoid the need of a linker.

The issue is when you have two, three or four .c files that are compiled as separate units that then need to be combined together.

Today, AVR chips and assembly works perfectly fine with .elf objects. But you will likely need to mess with linker scripts to get your bootloader working across different setups.

Especially if you have an element of dynamic boot loading (ex: bootloader program that later continues to load more Application code off of a MicroSD card or UART or over I2C comms.

I'm really not sure how far you can get with this toy project without running into immediate linker issues (or linker scripts).

pjmlp · 32d ago

The main difference is that languages that aren't C or C++, usually have the freedom to live outside the UNIX linker model, thus they have much more richer linker tooling.

That C# model you praise, you will find it easily on Object Pascal, Delphi, Modula-2, Eiffel, Oberon, and many other compiled languages that have their own compiler toolchain, without depending on having to have object files that look like they were generated from a C compiler.

antonvs · 32d ago

C# and Java still do linking, it just happens dynamically at runtime. That’s part of why startup time is slower in those languages, and why performance can be less predictable.

The main difference between linkers for native binaries and linking in IL-based languages is that native binary linking involves resolving memory addresses at build time. In the object files that are being linked, memory addresses are typically 0-relative to whatever section they’re in within that file. When you combine a bunch of object files together, you have to adjust the addresses so they can live together in the same address space. Object file A and B both might use addresses 0-10, but when they’re linked together, the linker will arrange it so that e.g. A uses 0-10 and B uses 11-21. That’s just a bit of simple offset arithmetic. And if both reference the same non-local symbol, it will be arranged so that both refer to the same memory address.

The IL-based languages retain all the relevant symbol information at runtime, which allows for a lot of flexibility at the cost of some performance - i.e. runtime lookups. This is typically optimized by caching the address after the first lookup, or if JIT compilation is occurring, embedding the relocated addresses in generated code.

The linker UX issues you ran into were mostly a function of the state of the art at the time, though. Languages like Go and Rust do native linking nowadays in a way that users barely notice. IL-based languages had a better linking UX partly because they were forced to - linking problems at runtime do still occur, e.g. “class not found”, but if linking in general had been a common problem for users at runtime instead of developers at build time, those languages would have struggled to get adoption.

pjmlp · 32d ago

It also happens at compile time if AOT is used.

Go doesn't add much to the way Turbo Pascal/Delphi/Ada/Modula-2,... linkers already work.

The main problem with languages like C and C++ is the prevalence of UNIX linker moderl.

neonsunset · 32d ago

Go and Rust are subject to linking too, Rust just happens to have a saner system which deals with it under the hood. It also goes through the same tooling C and C++ do and the subsequent object files may also need to be linked before producing a binary. Java and .NET's loading system are different since JVM uses loading at class granularity based on classpath whilst .NET uses assemblies, with Java, to my knowledge, moving towards modules which are similar a couple decades later (to also improve its startup latency). .NET's assembly system was made to directly address the pains of header/source file compilation and linking issues well-understood even back in the late 90s.

pjmlp · 32d ago

Java modules have nothing to do with that, rather not all packages are supposed to be public rather sub-packages as way to have clean implementations, but given the granularity, many developers end up relying on internals that were designed only for consumption from public APIs.

.NET Assemblies suffer from the same, unless you make use of some tricks like InternalsVisibleTo attribute.

During the .NET 1.0 days there was the idea to have components, for a role similar to how Java modules have come to fulfill, but it never took off, and the idea was confusing as many developers usually thought they related to COM, when they heard "components" alongside .NET.

https://learn.microsoft.com/en-us/dotnet/framework/app-domai...

virgilp · 32d ago

Linkers pretty much map data sections to memory, and in doing so are able to replace symbolic names (like global variables, or goto targets) with numbers. They may also completely drop some things that are not needed (e.g. code/files in a library that is never referenced).

I'm over-simplifying and also it's a bit incorrect, because there's also the loader that does a lot of the same work that linkers do, when loading the program in memory. So linkers don't actually produce the final image - but really, they're rather "simple" things (for some definition of "simple").

The hard-to-understand linker errors are typically caused by the compiler, not the linker (it's the compiler that speculatively chooses to use a symbol with a long and funny name, thinking that it'll later be provided by <somebody>, when in fact the linker later finds out that no library or object file actually provided said symbol; and then for the linker to give you a decent error message, it needs to have a pretty good understanding of what the compiler was actually trying to do - i.e. to know implementation details of the compiler that otherwise would not concern it at all).

tester756 · 32d ago

>The hard-to-understand linker errors are typically caused by the compiler, not the linker (it's the compiler that speculatively chooses to use a symbol with a long and funny name, thinking that it'll later be provided by <somebody>, when in fact the linker later finds out that no library or object file actually provided said symbol; and then for the linker to give you a decent error message, it needs to have a pretty good understanding of what the compiler was actually trying to do - i.e. to know implementation details of the compiler that otherwise would not concern it at all).

So... maybe let's avoid having linker as another/external tool and just let compiler perform linking

boricj · 32d ago

The linker stitches object files together, regardless of their origin. If a compiler directly outputs a finalized artifact, then it would be impossible to add code written in other programming languages into the mix unless the compiler also doubles as a linker.

tester756 · 32d ago

> then it would be impossible to add code written in other programming languages into

Is this really that important that we cannot skip this requirement?

raddan · 32d ago

You might be surprised how often multi-language programs appear. Basically all of modern day Python for starters. But also a number of important numerical libraries for C are actually written in Fortran.

jay-barronville · 32d ago

> You might be surprised how often multi-language programs appear. Basically all of modern day Python for starters. But also a number of important numerical libraries for C are actually written in Fortran.

In my experience, the most common use case is C-based programs with some C++ sprinkled in (especially if you need the C code to be compiled exclusively by a C compiler in order to maintain C semantics).

tester756 · 29d ago

I thought it is done by ffi

Narishma · 32d ago

> It's intriguing to think how different my experience could have been if educational material at the time had focused as much on full explanations of the compiler+linker process, including example error conditions, as it did on teaching the language.

Did you not read the manuals that came with Turbo C or Pascal? They explain all those things. They taught both the language and the tools. For example: https://archive.org/details/bitsavers_borlandturVersion5.0Us...

Microsoft tools back then also came with extensive high quality manuals.

lynx97 · 33d ago

Nitpick: Almost all Hello World C examples are wrong. printf is for when you need to use a format string. Hello World doesn't. Besides:

> puts() writes the string s and a trailing newline to stdout.

int main() { puts("Hello World!"); }

tom_ · 32d ago

But "Hello world\n" is a format string. The format strings with no % chars in them are the best type of format string! They're nearly impossible to get wrong!

unwind · 32d ago

I agree, but I have to point out that if you're gonna be like that, then you should be explicit about your final

    return 0;

tavianator · 32d ago

The C standard (since C99) says that `main()` has an implicit `return 0`, you don't need to write it explicitly.

01HNNWZ0MV43FF · 32d ago

Sure but are we teaching good habits to students, or are we golfing?

pjmlp · 31d ago

Given how many tutorials leave best practices out on how to do proper error handling, strings and arrays in C, doing analysis as part of the build, I would say golfing most of the time.

lynx97 · 30d ago

Aww, of course, you're right.

david2ndaccount · 32d ago

That’s not the point of hello world. It’s not to be as small a valid program as possible. It’s to be a small program that also exercises the needed functionality for using the tool usefully. All of the exercises following that hello world need formatted text, so introducing puts would just add confusion and wouldn’t verify that you have a working printf.

PhilipRoman · 33d ago

Eh, it compiles down to the same thing with optimizations enabled:

https://godbolt.org/z/zcqa4Txen

But I agree, using printf for constant strings is one step away from doing printf(x) which is a big no-no.

Joker_vD · 32d ago

Useless bit of compiler optimizations trivia: the "this printf() is equivalent to puts()" optimization seems to work by looking for the '%' in the format string, not by counting whether there is only one argument to printf(), e.g. if you add 42 as a second argument to the printf() — which is absolutely legal and required by the standard to Work as Intended™ — the resulting binary still uses puts().

indigoabstract · 32d ago

The example is kind of pedantic, but I think a linter might be able to catch it.

Mbwagava · 32d ago

Eh, not a fan of puts. It doesn't add any value over write or printf and it should be named "printLine".

But if you're still using raw libc in 2025 that's a problem you willingly opted into. I have zero sympathy.

stef-13013 · 32d ago

Really nice, thanks !!

Ask HN: Any good tools for viewing congressional bills?

Ask HN: Do we need a language designed specifically for AI code generation?

Ask HN: Startup getting spammed with PayPal disputes, what should we do?

Ask HN: A Tetris variant with greater tactical and strategic depth?

Ask HN: What would you work on if you couldn't fail?

Ask HN: Is synthetic data generation practical outside academia?

I Built an AI Agent with Gmail Access and Discovered a Security Hole

Get Your Dev Tool Mentioned by ChatGPT, Gemini Not Just Ranked on Google

Ask HN: Anyone else feeling increasingly alienated from the industry?

Ask HN: Has anybody built search on top of Anna's Archive?

Tiptap open-sources 10 formerly Pro extensions under MIT license

Summer projects (preferably open source) for college sophomores

Ask HN: Who is hiring? (June 2025)

Ask HN: How do I learn robotics in 2025?

Ask HN: Anyone making a living from a paid API?

Ask HN: How do I learn practical electronic repair?

Ask HN: What are some good resources for coding best practices?

Ask HN: Should I build a directory product?

Ask HN: Who wants to be hired? (June 2025)

Ask HN: Options for One-Handed Typing

Ask HN: Running AI agents in isolated environments

Ask HN: What do you put in claude.md and what you leave out?

Ask HN: What non-AI projects are you working on?

Ask HN: What are your fav/goto decision making hacks/heuristics?

Ask HN: What Does Your Self-Hosted LLM Stack Look Like in 2025?

Ask HN: What is the best LLM for consumer grade hardware?

Ask HN: Where do you go for cutting-edge dev news and info?

Ask HN: Walking while working and having meetings

Ask HN: Who's Using the Origin Private File System?

Ask HN: How are parents who program teaching their kids today?

Reaching my first 100 users without money or audience (at 10K users now)

Ask HN: Dealing with Vibe Coding Depression?

O(1) memory, no-preprocessing reachability algorithm for 2D grids

Ask HN: What tools are you using for AI evals? Everything feels half-baked

How do you store and maintain your CV/resume over time?

Ask HN: What's with the repeated job posts on "Who's hiring"?

Ask HN: List of skills to survive the AI tsunami

Ask HN: Resources for building AI agents for software development?

Driving Compilers (2023)

Comments (30)