Ask HN: Is synthetic data generation practical outside academia?
4 points by cpard 7h ago 2 comments
Ask HN: Has anybody built search on top of Anna's Archive?
283 points by neonate 3d ago 146 comments
Driving Compilers (2023)
93 misonic 30 5/5/2025, 2:17:12 AM fabiensanglard.net ↗
It's intriguing to think how different my experience could have been if educational material at the time had focused as much on full explanations of the compiler+linker process, including example error conditions, as it did on teaching the language.
30 years later, I like to claim that I have a reasonably workable understanding of how compilers work, but I'm still nebulous on how linkers do what they do. I'm much more comfortable with higher-level compilers such as C# that compile to a VM bytecode (IL) and don't worry about linkers.
Hardware has very peculiar rules for how it loads. The old floppy bootloader would only load the first sector (512 bytes), and after that it's that 512-byte code blocks job to finish loading the code and running it (often called the 2nd stage bootloader).
So writing this makes it super obvious what linkers do. At first you hardcode everything to set addresses. But then a function grows and no longer fits.
So now you have functions + their lengths, as well as a few holes for where your global variables go
And then different .c files may want different global (or static) variables. So now you need to somehow add the lengths of all data segments across all your .c files together.
And then suddenly you understand Linkers, and just use LD / Elf files.
--------
It's a bit of a trial by fire. But not really??? There are super simple computers out there called MicroControllers with just 200 page manuals describing everything.
Writing a bootloader for some simple Atmel AVR chip is perfect for this learning experience. ATMega328p is the classic but there are better more modern chips.
But ATMega328p was popular 15 years ago and still is manufactured in large numbers today
The issue is when you have two, three or four .c files that are compiled as separate units that then need to be combined together.
Today, AVR chips and assembly works perfectly fine with .elf objects. But you will likely need to mess with linker scripts to get your bootloader working across different setups.
Especially if you have an element of dynamic boot loading (ex: bootloader program that later continues to load more Application code off of a MicroSD card or UART or over I2C comms.
I'm really not sure how far you can get with this toy project without running into immediate linker issues (or linker scripts).
That C# model you praise, you will find it easily on Object Pascal, Delphi, Modula-2, Eiffel, Oberon, and many other compiled languages that have their own compiler toolchain, without depending on having to have object files that look like they were generated from a C compiler.
The main difference between linkers for native binaries and linking in IL-based languages is that native binary linking involves resolving memory addresses at build time. In the object files that are being linked, memory addresses are typically 0-relative to whatever section they’re in within that file. When you combine a bunch of object files together, you have to adjust the addresses so they can live together in the same address space. Object file A and B both might use addresses 0-10, but when they’re linked together, the linker will arrange it so that e.g. A uses 0-10 and B uses 11-21. That’s just a bit of simple offset arithmetic. And if both reference the same non-local symbol, it will be arranged so that both refer to the same memory address.
The IL-based languages retain all the relevant symbol information at runtime, which allows for a lot of flexibility at the cost of some performance - i.e. runtime lookups. This is typically optimized by caching the address after the first lookup, or if JIT compilation is occurring, embedding the relocated addresses in generated code.
The linker UX issues you ran into were mostly a function of the state of the art at the time, though. Languages like Go and Rust do native linking nowadays in a way that users barely notice. IL-based languages had a better linking UX partly because they were forced to - linking problems at runtime do still occur, e.g. “class not found”, but if linking in general had been a common problem for users at runtime instead of developers at build time, those languages would have struggled to get adoption.
Go doesn't add much to the way Turbo Pascal/Delphi/Ada/Modula-2,... linkers already work.
The main problem with languages like C and C++ is the prevalence of UNIX linker moderl.
.NET Assemblies suffer from the same, unless you make use of some tricks like InternalsVisibleTo attribute.
During the .NET 1.0 days there was the idea to have components, for a role similar to how Java modules have come to fulfill, but it never took off, and the idea was confusing as many developers usually thought they related to COM, when they heard "components" alongside .NET.
https://learn.microsoft.com/en-us/dotnet/framework/app-domai...
I'm over-simplifying and also it's a bit incorrect, because there's also the loader that does a lot of the same work that linkers do, when loading the program in memory. So linkers don't actually produce the final image - but really, they're rather "simple" things (for some definition of "simple").
The hard-to-understand linker errors are typically caused by the compiler, not the linker (it's the compiler that speculatively chooses to use a symbol with a long and funny name, thinking that it'll later be provided by <somebody>, when in fact the linker later finds out that no library or object file actually provided said symbol; and then for the linker to give you a decent error message, it needs to have a pretty good understanding of what the compiler was actually trying to do - i.e. to know implementation details of the compiler that otherwise would not concern it at all).
So... maybe let's avoid having linker as another/external tool and just let compiler perform linking
Is this really that important that we cannot skip this requirement?
In my experience, the most common use case is C-based programs with some C++ sprinkled in (especially if you need the C code to be compiled exclusively by a C compiler in order to maintain C semantics).
Did you not read the manuals that came with Turbo C or Pascal? They explain all those things. They taught both the language and the tools. For example: https://archive.org/details/bitsavers_borlandturVersion5.0Us...
Microsoft tools back then also came with extensive high quality manuals.
> puts() writes the string s and a trailing newline to stdout.
int main() { puts("Hello World!"); }
https://godbolt.org/z/zcqa4Txen
But I agree, using printf for constant strings is one step away from doing printf(x) which is a big no-no.
But if you're still using raw libc in 2025 that's a problem you willingly opted into. I have zero sympathy.