Show HN: Semantic Calculator (king-man+woman=?)
68 nxa 91 5/14/2025, 7:54:31 PM calc.datova.ai ↗
I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.
For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.
Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?
I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.
(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)
(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)
actor - man + woman = actress
garden + person = gardener
rat - sewer + tree = squirrel
toe - leg + arm = digit
100%
And, worse, most latent spaces are decidedly non-linear. And so arithmetic loses a lot of its meaning. (IIRC word2vec mostly avoided nonlinearity except for the loss function). Yes, the distance metric sort-of survives, but addition/multiplication are meaningless.
(This is also the reason choosing your embedding model is a hard-to-reverse technical decision - you can't just transform existing embeddings into a different latent space. A change means "reembed all")
The prompt I used:
> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:
The more I think about it the less surprised I am, but my initial thoughts were quite simply "now way" - surely an approximation of an NLP model made by another NLP model can't beat the original, but the LLM training process (and data volume) is just so much more powerful I guess...
Curious tool but not what I would call accurate.
Are you using word2vec for these, or embeddings from another model?
I also wanted to add some flavor since it looks like many folks in this thread haven't seen something like this - it's been known since 2013 that we can do this (but it's great to remind folks especially with all the "modern" interest in NLP).
It's also known (in some circles!) that a lot of these vector arithmetic things need some tricks to really shine. For example, excluding the words already present in the query[1]. Others in this thread seem surprised at some of the biases present - there's also a long history of work on that [2,3].
[1] https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...
[2] https://arxiv.org/abs/1905.09866
[3] https://arxiv.org/abs/1903.03862
The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.
It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).
Other stuff that works: key, door, lock, smooth
Some words that result in "flintlock": violence, anger, swing, hit, impact
Or maybe they would all be completely inscrutable and man-woman would be like the 50th strongest result.
hacker - code = professional golf
https://neal.fun/infinite-craft/
It provides a panel filled with slowly moving dots. Right of the panel, there are objects labeled "water", "fire", "wind", and "earth" that you can instantiate on the panel and drag around. As you drag them, the background dots, if nearby, will grow lines connecting to them. These lines are not persistent.
And that's it. Nothing ever happens, there are no interactions except for the lines that appear while you're holding the mouse down, and while there is notionally a help window listing the controls, the only controls are "select item", "delete item", and "duplicate item". There is also an "about" panel, which contains no information.
[0] https://youtu.be/8-ytx84lUK8
> a drug (such as opium or morphine) that in moderate doses dulls the senses, relieves pain, and induces profound sleep but in excessive doses causes stupor, coma, or convulsions
https://www.merriam-webster.com/dictionary/narcotic
So we can see some element of losing time in that type of drug. I guess? Maybe I’m anthropomorphizing a bit.
I built a game[0] along similar lines, inspired by infinite craft[1].
The idea is that you combine (or subtract) “elements” until you find the goal element.
I’ve had a lot of fun with it, but it often hits the same generated element. Maybe I should update it to use the second (third, etc.) choice, similar to your tool.
[0] https://alchemy.magicloops.app/
[1] https://neal.fun/infinite-craft/
paleolith + cat = Paleolithic Age
paleolith + dog = Paleolithic Age
paleolith - cat = neolith
paleolith - dog = hand ax
cat - dog = meow
Wonder if some of the math is off or I am not using this properly
Also, if it gets buried in comments, proper nouns need to be capitalized (Paris-France+Germany).
I am planning on patching up the UI based on your feedback.
Getting to cornbread elegantly has been challenging.
I’ve been unable to find it since. Does anyone know which site I’m thinking of?
Can not personally find the connection here, was expecting father or something.
High dimension vector is always hard to explain. This is an example.
great idea, but I find the results unamusing
Edit: these must be capitalized to be recognized.
I think you need to disable auto-capitalisation because on mobile the first word becomes uppercase and triggers a validation error.
(Goshawks are very intense, gyrs tend to be leisurely in flight.)
man+vagina=woman (ok that is boring)
https://en.m.wikipedia.org/wiki/Isle_of_Man
Accurate.
queen - woman + man = drone
Navratilova - woman + man = Lendl
female + age = male
rice + fish + raw = meat
hahaha... I JUST WANT SUSHI!
this is pretty fun
hmm...
six (84%)
Close enough I suppose