Ask HN: What are your Unicode woes?
4 Rendello 3 6/14/2025, 2:48:43 PM
I've always worked with text, but I only started digging deep into understanding Unicode this year.
What do HN people have to say about Unicode and UTF-{8,16,32}? Are there parts you've never really understood? Have you had unexpected bugs due to misunderstood properties of text?
Turns out, sometimes changing case changes not only the number of bytes (in UTF-8), but the number of encoded characters! This led to my post "UTF-8 characters that behave oddly when the case is changed" [1], which inspired a lot of conversation that taught me a lot. After that, I started reading Unicode documentation in earnest, and building up an idea of what a new tool should show. I'm trying to make clear things I didn't (and sometimes still don't) understand, so I'd love to know what causes pains in the wild / gaps in people's understanding.
1. https://news.ycombinator.com/item?id=42014045