Ask HN: What are your Unicode woes?

4 Rendello 3 6/14/2025, 2:48:43 PM
I've always worked with text, but I only started digging deep into understanding Unicode this year.

What do HN people have to say about Unicode and UTF-{8,16,32}? Are there parts you've never really understood? Have you had unexpected bugs due to misunderstood properties of text?

Comments (3)

NoahZuniga · 4m ago
I guess its kind of annoying that letters with diacritics can be represented in multiple different ways
solardev · 2h ago
I don't understand the difference between a character, a codepoint, a glyph, and whatever else makes up a single "thing" in unicode.
Rendello · 12h ago
I (OP) have been working on some Unicode visualization tooling for a while now. The idea started when I had some buggy string-matching code. I was matching case-insensitively, then using those ranges to highlight the original text.

Turns out, sometimes changing case changes not only the number of bytes (in UTF-8), but the number of encoded characters! This led to my post "UTF-8 characters that behave oddly when the case is changed" [1], which inspired a lot of conversation that taught me a lot. After that, I started reading Unicode documentation in earnest, and building up an idea of what a new tool should show. I'm trying to make clear things I didn't (and sometimes still don't) understand, so I'd love to know what causes pains in the wild / gaps in people's understanding.

1. https://news.ycombinator.com/item?id=42014045