Compressing Icelandic name declension patterns into a 3.27 kB trie

137 alexharri 44 8/2/2025, 11:28:33 AM alexharri.com ↗

Comments (44)

dmurray · 3h ago

For the 800 names that were missing declension data in the database, it seems like the most straightforward thing to do would be to assign their declensions by hand. It shouldn't take a native speaker more than a couple of hours (if some name they haven't seen before is ambiguous, then whatever they guess at least won't sound obviously wrong to other native speakers). Alternatively, very cheap to ask an LLM to do it.

Encoding them into a trie like this would still be a good way to distribute the result, but you don't have to rely on the trie also being a good way to guess the declensions.

perching_aix · 3h ago

Yeah, that'd be a good idea. That said, it still wouldn't resolve the issue for names that are in-use despite not being approved (or foreign names).

I also live in a country with a centrally governed personal name list, but you can request exceptions, and there are people who were born before the list existed, so their names won't necessarily be on the list either. Immigrants can also retain their names during naturalization I believe, and there can be lots of other complications still. So the ability to sorta-kinda predict the proper declension is still useful.

esafak · 1h ago

I wonder if existing LLMs already know these patterns?

wizzwizz4 · 1h ago

I see no reason that an LLM should be better at guessing than a trie (unless the actual example was in its training data, in which case a web search would be more appropriate).

dmurray · 4m ago

I agree. I just like having the guessing done at compile time on principle. It allows you to change a guess, if you find that it's wrong, and convince yourself that you haven't broken any of the other cases where you were previously accidentally right.

treetalker · 3h ago

I remember that when I was first learning Spanish in high school, I found a piece of (Windows) software that pelted you with a series of pairs of an infinitive and a tense, and you had to conjugate the infinitive accordingly. (Spanish conjugation typically changes the end of the word; irregular verbs tend to involve stem changes). It was fantastic practice and really ingrained the rules; I became a whiz at it.

When I started learning Russian, the declensions (like the ones mentioned in the article) really threw me for a loop. I looked all over for a similar app to explain the patterns and drill rote practice, but never found one.

While slightly off-topic, does anyone know of such an app (web-based or macOS/iOS)?

kashunstva · 2h ago

> … learning Russian… explain the patterns… such an app

Non-native Russian speaker here. In the past, I cobbled together some scripts that use the spaCy Python module with the larger of the two Russian modules to provide context-aware lemmatization and grammatical tag extraction.

On the whole, though, my biggest gains in Russian were in letting go of the need to analytically deconstruct the inflections and instead build up a mental library of patterns (and exceptions) in my head through use.

EDIT: I mean context within a sentence, not a broader meaning.

jeffwass · 2h ago

When I was learning Spanish (on my own) 25 years ago I had a Spanish/English dictionary. It only translated verbs to Spanish infinitive, but each had a numerical index mapping it to a class of verbs with the same conjugation pattern.

There was a section at the front of the dictionary with full conjugation patterns over all tenses for one sample verb in each class.

Eg, each type of stem-changing verb fell into one index, full irregulars were singletons in their own class, some irregulars that behave similarly (iirc tener and detener) shared one class.

So all verbs in Spanish fell neatly into a few dozen unique patterns, and the indexing was already done.

I was going to build a quiz software just like you mentioned to conjugate any verb in any tense, but “never got around to it”.

I wonder how the reverse-string trie pattern in the article would be for reconstructing the class mapping.

Rendello · 1h ago

There's some Anki (flashcard) decks that use the "KOFI" method:

> KOFI (Konjugation First) is the name I've given to a provocative language-learning approach I've created: to learn all the forms of a language's conjugation before even starting to formally study the language

I used the French one, years after I learned French, because my conjugation was abysmal. You can get by using basic tenses or wrong tenses, and people will understand you, but it's not what you want. The KOFI method is supposed to teach you all the conjugation patterns in a matter of months before learning the language, I'd like to give it a try in-earnest some day for a new language. My interest in French has waned so I didn't stick with it.

https://ankiweb.net/shared/info/1131659186

yorwba · 2h ago

You might be able to build something similar yourself using declension data extracted from Wiktionary using wiktextract: https://github.com/tatuylonen/wiktextract#pre-extracted-data

leobg · 3h ago

https://memrussian.com/?

netsharc · 3h ago

Grandfather talks about classical Windows software. On the Play Store this app says "Contains ads - In-app purchases".

Ah, as a cheap bastard, I hate how software was pay once back then, and for this one I'm just going to ask you what's the monthly subscription price?

GuB-42 · 2h ago

I don't know about this app but many of the "Contains ads - In-app purchases" apps offer to remove the ads for a one-time payment.

mpascale00 · 2h ago

This comes up in so many threads here... How can we change the culture of subscriptions back to pay once???

sgarland · 1h ago

Given how profitable it is, I doubt it’ll be changed.

That said, I very much like Codeweavers’ approach [0], which IMO is the modern equivalent to purchasing software on a physical medium: you buy it, you can re-download it as many times as you’d like, install it on as many machines as you’d like (single-user usage only), and you get 1 year of updates and support. After that, you can still keep using it indefinitely, but you don’t get updates or paid support. You get a discount if you renew before expiry. They also have a lifetime option which, so far, they’ve not indicated they’re going to change.

I have no affiliation with them, I just think it’s a good product, and a good licensing / sales model.

[0]: https://www.codeweavers.com/store

necovek · 2h ago

It's not really about the culture anymore. Software that requires maintenance — and most does — has a continuous development cost. As such, subscription is the most natural way to cover it.

On the other hand, we have software which has low maintenance cost, but sold for peanuts ($0-$10) in small quantities, so authors try to introduce alternative revenue streams.

As in, it's fair to pay continuously (subscription) for continuous work (maintenance), so I don't expect that to go away. Ads, though, yuck...

sneak · 2h ago

Software sold today does not require maintenance. Software to work in the future requires maintenance. I am not buying future software. I am buying today software.

Increasingly I am not buying software at all.

perching_aix · 2h ago

This is a good argument in favor of subscriptions not being mandatory, but not in favor of the abolishment of subscriptions overall, which is what they were talking about.

sgarland · 1h ago

On the contrary, software today is so absurdly buggy that it often does require maintenance to work.

charcircuit · 1h ago

Even ignoring security, bug fixes, new features, etc it is also not fair that you can get value from the app every month, but the developer doesn't get to capture a reward for any of this value. Having people pay monthly for value they get monthly seems reasonable.

nsksl · 2h ago

Find a pirate version if possible…

gametorch · 59m ago

I used Clozemaster effectively to learn Russian. It's not exactly what out describe, but you can fly through many "clozes" to ingrain the patterns into your brain.

ryanjshaw · 23m ago

An interesting article but I was surprised there was no discussion about what humans do to address this problem?

Zanfa · 4m ago

They stick with the nominative case. That’s the only safe way not to butcher somebody’s name in a language like Estonian that has 14 cases. It’s infinitely easier to update copy to use only nominative than try to apply the cases automatically.

silvestrov · 1h ago

One more optimization idea: instead of the trie mapping to the suffix string directly, then instead make an array of unique suffixes and let the trie map to the index into the array, e.g.

    const suffixes = [",,,", "a,u,u,u", ",,i,s", ",,,s", "i,a,a,a", ...];

and then use the index of this list in the

    var serializedInput = "{e:{n:{ein:0_r: ...

radpanda · 2h ago

> There are, in fact, 88 approved Icelandic names with this exact pattern of declension, and they all end with “dur”, “tur” or “ður”.

…

> But that quickly breaks down. There are other names ending with “ður” or “dur” that follow a different pattern of declension

My “everything should be completely orderly” comp-sci brain is always triggered by these almost trivial problems that end up being much more interesting.

Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix? If one wanted to improve upon your work for unknown names, rather than consider the letters used, would you have to do some NLP on the name to get a representation of the pronunciation and look that up (in a trie or otherwise)?

dmit · 1h ago

> Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix?

Careful, this is how you fall down the Are Dependent Types The Answer?? hole.

perching_aix · 1h ago

Not sure what that's supposed to mean, but if Icelandic is anything like my native language in this, then it is indeed a pronunciation based thing. Which should make sense, since languages are (historically) spoken first, written second.

dmit · 1h ago

Heheh, it was mostly a reference to my [and mostly others'!] experiments with encoding human languages in a programming language. There are some pretty neat ideas there to explore, like the difference between Subject-Object-Verb (SOV) and Object-Subject-Verb. Or postfix languages (e.g. Forth) mapping to some human languages.

In this particular example, having a subsequent part of an expression rely on prior parts would usually be accomplished at runtime in most languages. But some (like Idris) might allow you to encode the rules in the type system. Thus the rabbit hole.

perching_aix · 1h ago

Ah okay. That's a journey I'm currently also preparing to embark on, though from the other direction: I'm trying to generate "natural" language from program code. I already know it's pretty hopeless, but increasingly I feel like it's not really a choice anyhow, so I may as well finally have a go at it. Let's see :)

dmit · 1h ago

Godspeed!

jedimastert · 3h ago

It's like an interview question from hell. Reversing a trie is those things that I might ever use once in my life, but that one time I will look like an absolute wizard.

robin_reala · 3h ago

No idea if Rails copes with this automatically, but it feels like the sort of magic it’s historically been really good at. I remember reading the source code for `pluralise` and finding that someone had encoded the pluralisation rules including irregular cases for Welsh.

Alifatisk · 40m ago

Love Rails, there is a method for everything

kmmbvnr_ · 3h ago

Doesn't that look like an interesting approach for highly optimized embeddings?

lifthrasiir · 3h ago

A possible alternative, especially for beygla/strict, would be perfect hashing.

sneak · 2h ago

This seems complicated.

Why not just reuse the existing standard and change everyone’s last names to Kim, Lee, or Park?

dmit · 1h ago

> everyone’s last names

*surnames. Not last in that case, whatever the case is you're trying to make.

alucardo · 3h ago

Hmm, is this lib GDPR compliant?

bot403 · 3h ago

If this isn't compliant than neither are name day calendars or baby name websites.

It's not a privacy issue if it's just "someone's" name.

shagie · 1h ago

There are a relatively finite number of Icelandic names. https://en.wikipedia.org/wiki/Icelandic_Naming_Committee

> A name not already on the official list of approved names must be submitted to the naming committee for approval. A new name is considered for its compatibility with Icelandic tradition and for the likelihood that it might cause the bearer embarrassment. Under Article 5 of the Personal Names Act, names must be compatible with Icelandic grammar (in which all nouns, including proper names, have grammatical gender and change their forms in an orderly fashion according to the language's case system).

A database of those names is no more interesting or personal than a dictionary or list of names ( https://www.insee.fr/en/statistiques/6536067 ) in another language... which is where they got the data.

> Iceland has a publicly run institution, Árnastofnun, that manages the Database of Icelandic Morphology (DIM). The database was created, amongst other reasons, to support Icelandic language technology.

https://bin.arnastofnun.is/DMII/aboutDMII/

There is no more personal information being presented than saying John or providing https://en.wikipedia.org/wiki/John_(given_name) or https://www.wolframalpha.com/input?i=John

John may be your given name, but that data isn't personal data. One of the numbers 1969, 1978, 1987, 1996 might be your birth year... but https://oeis.org/A101039 isn't personal information either. Combining John with Smith and 1978 as the year of someone's birth... now you've got personal information that would be covered by the GDPR.

detaro · 3h ago

Why wouldn't it be?

kiicia · 1h ago

GDPR is about accountability for handling identifiers like full name of actual person. Using parts of names, where each part does not identify any particular person, in generalized list like described here does not fall under GDPR.

yujzgzc · 1h ago

Valiant effort at old-school engineering applied to a niche problem. (Iceland has a population of only around 400,000 people!) As much as I love the geekery of this stuff though, isn't it already a better ROI to get an LLM to generate the strings you need? It has its own other problems (not claiming it'll be perfect) but for something so language related, it makes a lot of sense. Would also work for other languages that have the same problem with declension of proper nouns like Russian or Finnish.

Ask HN: Who is hiring? (August 2025)

Ask HN: Who wants to be hired? (August 2025)

AI Teammates for for Revenue Teams

Ask HN: How is it possible to get -0.0 in a sum?

Ask HN: Will AI push more of us into freelancing?

I underestimated how lonely building solo can be

I launched 17 side projects. Result? I'm rich in expired domains

Ask HN: Have you ever regretted open-sourcing something?

Ask HN: How do you avoid job hunting burnout?

Ask HN: Is true democracy possible in online tech communities?

Nova: A New Web Framework for Erlang

Ask HN: Who Is Looking for a Cofounder?

Ask HN: Is "messaging systems specialist" a real job title or niche?

Tell HN: Gemini CLI is buggy; use at your own risk

Claude Code weekly rate limits

Ask HN: Which software companies hire people in Africa for remote work?

Ask HN: What are you working on? (July 2025)

Ask HN: Best AI Automation Platform

Ask HN: AI Chat Agent vs. Traditional Personal Website?

Ask HN: Anyone know how to reach Cloudflare support?

Ask HN: Startups, 0 Stability – Is It Time to Move on from Tech?

Comparison Between Sync Engines

Tell HN: Add "NSFW" words in your Google query to avoid AI summary

Has AI coding gone too far? I feel like I'm losing control of my own projects

Ask HN: How will the OSA affect small Mastodon instances?

Has any YC founder ever gone to jail for startup-related crimes?

Ask HN: Are developers sad about AI writing more of their code?

Warp.dev Terminal – Overpriced, Buggy, and AI-Sabotaged My Code

New budget financial API, based on EDGAR data

Google Maps Reviews in Germany Are Basically Dead

Ask HN: Small Utility App Monetization

Ask HN: Local LLM agents on Jetson/RPi without a heavy runtime

Ask HN: Advise for technical solo founders trying to secure venture capital?

Ask HN: Catching Up with Current Datacenters

Ask HN: State of the art with local LLMs and agents

Tell HN: Google Denies OS Support for Pixel from Non-Licenced Retailer

Ask HN: What do you do with all your unused tech "swag"?

Compressing Icelandic name declension patterns into a 3.27 kB trie

Comments (44)