> There are, in fact, 88 approved Icelandic names with this exact pattern of declension, and they all end with “dur”, “tur” or “ður”.
…
> But that quickly breaks down. There are other names ending with “ður” or “dur” that follow a different pattern of declension
My “everything should be completely orderly” comp-sci brain is always triggered by these almost trivial problems that end up being much more interesting.
Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix? If one wanted to improve upon your work for unknown names, rather than consider the letters used, would you have to do some NLP on the name to get a representation of the pronunciation and look that up (in a trie or otherwise)?
dmurray · 1h ago
For the 800 names that were missing declension data in the database, it seems like the most straightforward thing to do would be to assign their declensions by hand. It shouldn't take a native speaker more than a couple of hours (if some name they haven't seen before is ambiguous, then whatever they guess at least won't sound obviously wrong to other native speakers). Alternatively, very cheap to ask an LLM to do it.
Encoding them into a trie like this would still be a good way to distribute the result, but you don't have to rely on the trie also being a good way to guess the declensions.
perching_aix · 1h ago
Yeah, that'd be a good idea. That said, it still wouldn't resolve the issue for names that are in-use despite not being approved.
I also live in a country with a centrally governed personal name list, but you can request exceptions, and there are people who were born before the list existed, so their names won't necessarily be on the list either. Immigrants can also retain their names during naturalization I believe, and there can be lots of other complications still. So the ability to sorta-kinda predict the proper declension is still useful.
treetalker · 1h ago
I remember that when I was first learning Spanish in high school, I found a piece of (Windows) software that pelted you with a series of pairs of an infinitive and a tense, and you had to conjugate the infinitive accordingly. (Spanish conjugation typically changes the end of the word; irregular verbs tend to involve stem changes). It was fantastic practice and really ingrained the rules; I became a whiz at it.
When I started learning Russian, the declensions (like the ones mentioned in the article) really threw me for a loop. I looked all over for a similar app to explain the patterns and drill rote practice, but never found one.
While slightly off-topic, does anyone know of such an app (web-based or macOS/iOS)?
jeffwass · 24m ago
When I was learning Spanish (on my own) 25 years ago I had a Spanish/English dictionary. It only translated verbs to Spanish infinitive, but each had a numerical index mapping it to a class of verbs with the same conjugation pattern.
There was a section at the front of the dictionary with full conjugation patterns over all tenses for one sample verb in each class.
Eg, each type of stem-changing verb fell into one index, full irregulars were singletons in their own class, some irregulars that behave similarly (iirc tener and detener) shared one class.
So all verbs in Spanish fell neatly into a few dozen unique patterns, and the indexing was already done.
I was going to build a quiz software just like you mentioned to conjugate any verb in any tense, but “never got around to it”.
I wonder how the reverse-string trie pattern in the article would be for reconstructing the class mapping.
kashunstva · 22m ago
> … learning Russian… explain the patterns… such an app
Non-native Russian speaker here. In the past, I cobbled together some scripts that use the spaCy Python module with the larger of the two Russian modules to provide context-aware lemmatization and grammatical tag extraction.
On the whole, though, my biggest gains in Russian were in letting go of the need to analytically deconstruct the inflections and instead build up a mental library of patterns (and exceptions) in my head through use.
EDIT: I mean context within a sentence, not a broader meaning.
Grandfather talks about classical Windows software. On the Play Store this app says "Contains ads - In-app purchases".
Ah, as a cheap bastard, I hate how software was pay once back then, and for this one I'm just going to ask you what's the monthly subscription price?
GuB-42 · 14m ago
I don't know about this app but many of the "Contains ads - In-app purchases" apps offer to remove the ads for a one-time payment.
nsksl · 16m ago
Find a pirate version if possible…
mpascale00 · 35m ago
This comes up in so many threads here... How can we change the culture of subscriptions back to pay once???
necovek · 19m ago
It's not really about the culture anymore. Software that requires maintenance — and most does — has a continuous development cost. As such, subscription is the most natural way to cover it.
On the other hand, we have software which has low maintenance cost, but sold for peanuts ($0-$10) in small quantities, so authors try to introduce alternative revenue streams.
As in, it's fair to pay continuously (subscription) for continuous work (maintenance), so I don't expect that to go away. Ads, though, yuck...
sneak · 1m ago
Software sold today does not require maintenance. Software to work in the future requires maintenance. I am not buying future software. I am buying today software.
Increasingly I am not buying software at all.
jedimastert · 1h ago
It's like an interview question from hell. Reversing a trie is those things that I might ever use once in my life, but that one time I will look like an absolute wizard.
robin_reala · 1h ago
No idea if Rails copes with this automatically, but it feels like the sort of magic it’s historically been really good at. I remember reading the source code for `pluralise` and finding that someone had encoded the pluralisation rules including irregular cases for Welsh.
kmmbvnr_ · 1h ago
Doesn't that look like an interesting approach for highly optimized embeddings?
lifthrasiir · 1h ago
A possible alternative, especially for beygla/strict, would be perfect hashing.
alucardo · 1h ago
Hmm, is this lib GDPR compliant?
bot403 · 1h ago
If this isn't compliant than neither are name day calendars or baby name websites.
It's not a privacy issue if it's just "someone's" name.
…
> But that quickly breaks down. There are other names ending with “ður” or “dur” that follow a different pattern of declension
My “everything should be completely orderly” comp-sci brain is always triggered by these almost trivial problems that end up being much more interesting.
Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix? If one wanted to improve upon your work for unknown names, rather than consider the letters used, would you have to do some NLP on the name to get a representation of the pronunciation and look that up (in a trie or otherwise)?
Encoding them into a trie like this would still be a good way to distribute the result, but you don't have to rely on the trie also being a good way to guess the declensions.
I also live in a country with a centrally governed personal name list, but you can request exceptions, and there are people who were born before the list existed, so their names won't necessarily be on the list either. Immigrants can also retain their names during naturalization I believe, and there can be lots of other complications still. So the ability to sorta-kinda predict the proper declension is still useful.
When I started learning Russian, the declensions (like the ones mentioned in the article) really threw me for a loop. I looked all over for a similar app to explain the patterns and drill rote practice, but never found one.
While slightly off-topic, does anyone know of such an app (web-based or macOS/iOS)?
There was a section at the front of the dictionary with full conjugation patterns over all tenses for one sample verb in each class.
Eg, each type of stem-changing verb fell into one index, full irregulars were singletons in their own class, some irregulars that behave similarly (iirc tener and detener) shared one class.
So all verbs in Spanish fell neatly into a few dozen unique patterns, and the indexing was already done.
I was going to build a quiz software just like you mentioned to conjugate any verb in any tense, but “never got around to it”.
I wonder how the reverse-string trie pattern in the article would be for reconstructing the class mapping.
Non-native Russian speaker here. In the past, I cobbled together some scripts that use the spaCy Python module with the larger of the two Russian modules to provide context-aware lemmatization and grammatical tag extraction.
On the whole, though, my biggest gains in Russian were in letting go of the need to analytically deconstruct the inflections and instead build up a mental library of patterns (and exceptions) in my head through use.
EDIT: I mean context within a sentence, not a broader meaning.
Ah, as a cheap bastard, I hate how software was pay once back then, and for this one I'm just going to ask you what's the monthly subscription price?
On the other hand, we have software which has low maintenance cost, but sold for peanuts ($0-$10) in small quantities, so authors try to introduce alternative revenue streams.
As in, it's fair to pay continuously (subscription) for continuous work (maintenance), so I don't expect that to go away. Ads, though, yuck...
Increasingly I am not buying software at all.
It's not a privacy issue if it's just "someone's" name.