This was a great post that really gets into a lot of the details. As the author of one of the Emacs "clones" I wrote a detailed comment about some of this over on Reddit that might be of interest to readers here.
That's quite a read! Looks like emacs has done an amazing job of handling text correctly in the face of quite a few challenges. Including cases where there perhaps is no "correct" choice.
Now, I'm certainly not in the "all C code must be rewritten in Rust because security" camp but it does raise the question: With all this complexity how do I know that pasting text from a web page into emacs (or any editor really) isn't going to trigger an undiscovered vulnerability?
Edit: I guess that's rhetorical question because of course the answer is "you don't".
iLemming · 1h ago
IMO the only/main reason to rewrite C-parts in Rust is to refactor and untagle the code to generate better incentive for people to work on it, because from what I heard - some C parts of Emacs are of nightmarish complexity, and there are literally only a few on this planet who even want to deal with it.
And the same time, I don't think there's enough people who can be persuaded to think that in a decade or two those Rust-rewritten parts wouldn't become problematic either - who can promise today that Zig for example isn't a better choice for that? Or maybe even some close-to-metal Lisp variant?
dlachausse · 1d ago
C style null terminated strings were a mistake. They are almost never the right answer. Even C itself should start transitioning to length prefixed strings instead.
Suzuran · 1d ago
It's worse than that; Null-terminated strings both predate C and were considered harmful when C was created.
Suzuran · 1d ago
The answer isn't material; You just rewrite it in Rust, and then if anything goes wrong it's not your fault because you did The Right Thing. Rust cannot fail, it can only be failed.
mdaniel · 1d ago
> Because of IRC's scandanavian origin, the characters {}| are considered to be the lower case equivalents of the characters []\, respectively
https://www.reddit.com/r/emacs/comments/1n7i586/comment/ncbc...
Now, I'm certainly not in the "all C code must be rewritten in Rust because security" camp but it does raise the question: With all this complexity how do I know that pasting text from a web page into emacs (or any editor really) isn't going to trigger an undiscovered vulnerability?
Edit: I guess that's rhetorical question because of course the answer is "you don't".
And the same time, I don't think there's enough people who can be persuaded to think that in a decade or two those Rust-rewritten parts wouldn't become problematic either - who can promise today that Zig for example isn't a better choice for that? Or maybe even some close-to-metal Lisp variant?
/me facepalms
Text rendering hates you (2019) - https://news.ycombinator.com/item?id=36478892 - June, 2023 (119 comments)
et al, as there was a follow-up posted in the top comment of that thread
I thought there was an "Falsehoods Programmers Believe About Text" but between that link and <https://github.com/kdeldycke/awesome-falsehood#international...> it's close enough for the point