I find it surprising that a single-pass compiler is easier to implement than a traditional lexer->parser->AST->emitter. (I'm not a compiler expert, though.) I'd have expected that generating an AST would be at least as simple, if not simpler. Plus by generating an AST, doing some simple optimization is a lot easier: one can pattern-match parts of the AST and replace them with more efficient equivalents. Maybe I'm overthinking this, though. I tend to like extensible program designs, even when they don't necessarily make sense for the scale of the program…
Still a really cool article and an impressive project, though. I especially like the StringPool technique; I'll have to keep it in mind if I ever write a compiler!
This article breaks it down well enough to make me feel like I could write my own C compiler targeting AVR. (I probably could... but it would not be easy.)
Never actually looked into how compilers work before, it's surprisingly similar/related to linguistics.
measurablefunc · 1h ago
It's b/c when Chomsky invented the theory of formal grammars he was studying natural languages & the universality of abstract grammar¹. Computer scientists realized later that they could use the same theory as a foundation for formalizing the grammatical structures of programming languages.
I could probably do it - but you wouldn't like it. My dictionaries would be a linked-list, looking for a key becomes a linear search... (if you gave me C++ I'd use std::map) I'm assuming you will allow me to use the C standard library, if I have to implement strlen or malloc in that 500 lines of C I'm not sure I can pull that off. 500 lines is aggressive, but IOCCC gives me plenty of tricks to get the line count down and the language isn't that big. I'm also going to assume 100% valid python code is fed in, if there is a bug or error of any sort that is undefined behavior.
Note that most of what makes python great isn't the language, it is the library. I believe that large parts of the python library are also written in C (for speed), and thus you won't be able to use my 500 line python compiler for anything useful because you won't have any useful libraries.
wyldfire · 2h ago
A python VM that consumes bytecode might be doable in not-ludicrous-amounts of C. Not 500 lines I suppose. But something manageable I think? Especially if you targeted the older releases.
nickpsecurity · 20m ago
Maybe 500 lines of Pythonic, macro-heavy C. If the macros' LOC don't count. Maybe.
TZubiri · 47m ago
Not to be that guy, but Python is an interpreted language.
That said, I guess technically you could make something that compiles python to an executable? This is hacker news after all
That is not a compiler. That is called a wrapper script.
But funny none the less.
amszmidt · 1h ago
The original cc was just a wrapper like this Python example around a bunch of external programs, calling c00, c01, until something could be fed to as and then linked using ld.
Still a really cool article and an impressive project, though. I especially like the StringPool technique; I'll have to keep it in mind if I ever write a compiler!
Writing a C compiler in 500 lines of Python - https://news.ycombinator.com/item?id=37383913 - Sept 2023 (165 comments)
Never actually looked into how compilers work before, it's surprisingly similar/related to linguistics.
¹https://en.wikipedia.org/wiki/Chomsky_hierarchy
Note that most of what makes python great isn't the language, it is the library. I believe that large parts of the python library are also written in C (for speed), and thus you won't be able to use my 500 line python compiler for anything useful because you won't have any useful libraries.
That said, I guess technically you could make something that compiles python to an executable? This is hacker news after all
import sys, subprocess
subprocess.run(["gcc", sys.argv[1], "-o", "a.out"])
GCC does basically the same thing even today,