The example given for inverting an embedding back to text doesn't help the idea that this effect is reflecting some "shared statistical model of reality": What would be the plausible whalesong mapping of "Mage (foaled April 18, 2020) is an American Thoroughbred racehorse who won the 2023 Kentucky Derby"?
There isn't anything core to reality about Kentucky, its Derby, the Gregorian calendar, America, horse breeds, etc. These are all cultural inventions that happen to have particular importance in global human culture because of accidents of history, and are well-attested in training sets. At best we are seeing some statistical convergence on training sets because everyone is training on the same pile and scraping the barrel for any differences.
benlivengood · 4m ago
You also can't translate "Mage (foaled April 18, 2020) is an American Thoroughbred racehorse who won the 2023 Kentucky Derby" into Hellenistic Greek or some modern indigenous languages because there isn't enough shared context; you'd need to give humans speaking those languages a glossary for any of the translation to make sense, or allow them to interrogate an LLM to act as the glossary.
I'd say our current largest LLMs probably contain sufficient detail to explain a concept like a named race horse starting from QCD+gravity and ending up at cultural human events, given a foothold of some common ground to translate into a new unknown language. In a sense, that's what a model of reality is. I think it's possible because LLMs figure out translation between human languages by default with enough pretraining.
IAmNotACellist · 48m ago
I agree LLMs are converging on a current representation of reality based on the collective works of humanity. What we need to do is provide AIs with realtime sensory input, simulated hormones each with their own half-lifes based on metabolic conditions and energy usage, a constant thinking loop, and discover a synthetic psilocybin that's capable of causing creative, cross-neural connections similar to human brains. We have the stoned ape theory, we need the stoned AI theory.
buffet_overflow · 38m ago
Or perhaps we make them attractions at a theme park, but let Anthony Hopkins have admin access to the source code. What could go wrong?
coffeecoders · 2h ago
I think “we might decode whale speech or ancient languages” is a huge stretch. Context is the most important part of what makes language useful.
There is billions of human-written texts, grounded in shared experience that makes our AI good at language. We don't have that for a whale.
kindkang2024 · 36m ago
If we could help gorillas or elephants (both highly intelligent) learn to name things and use symbols — in a form they can comprehend and create to express their will — enabling them to pass down their experiences and wisdom across generations, I believe they could quietly be as smart as we are.
Ps. I am excited about Google’s Gemma dolphin project (https://blog.google/technology/ai/dolphingemma/), but I would prefer if they chose elephants instead of dolphins as the subject, since we live on land, not in water. This way, more emphasis could be placed on core research, and immediate communication feedback would be possible.
bbarnett · 6m ago
I don't know why, but I just had a horrible vision of traveling 200 years hence, and elephants are now the ruling class.
I don't care that it might be better, and I shall hold you personally responsible should this come to pass.
klank · 1h ago
If a lion could speak, would we understand it?
godelski · 30m ago
I don't know about a Lion, but I think Wittgenstein could have benefited from having a pet.
I train my cat and while I can't always understand her I think one of the most impressive features of the human mind is to be able to have such great understanding of others. We have theory of mind, joint attention, triadic awareness, and much more. My cat can understand me a bit but it's definitely asymmetric.
It's definitely not easy to understand other animals. As Wittgenstein suggests, their minds are alien to us. But we seem to be able to adapt. I'm much better at understanding my cat than my girlfriend (all the local street cats love me, and I teach many of them tricks) but I'm also nothing compared to experts I've seen.
Honestly, I think everyone studying AI could benefit by spending some more time studying animal cognition. While not like computer minds these are testable "alien minds" and can help us better understand the general nature of intelligence
ecocentrik · 1h ago
That was a philosophical position on the difficulty of understanding alien concepts and language, not a hard technological limit.
klank · 1h ago
I'm missing why that distinction matters given the thread of conversation.
Would you care to expound?
eddythompson80 · 1h ago
There is nothing really special about speech as a form of communication. All animals communicate with each other and with other animals. Informational density and, uhhhhh, cyclomatic complexity might be different between speech and a dance or a grunt or whatever.
klank · 1h ago
I was referencing Wittgenstein's "If a lion could speak, we would not understand it." Wittgenstein believed (and I am strongly inclined to agree with him) that our ability to convey meaning through communication was intrinsically tied to (or, rather, sprang forth from) our physical, lived experiences.
Thus, to your point, assuming communication, because "there's nothing really special about speech", does that mean we would be able to understand a lion, if the lion could speak? Wittgenstein would say probably not. At least not initially and not until we had built shared lived experiences.
Isamu · 1h ago
If we had a sufficiently large corpus of lion-speech we could build an LLM (Lion Language Model) that would “understand” as well as any model could.
Which isn’t saying much, it still couldn’t explain Lion Language to us, it could just generate statistically plausible examples or recognize examples.
To translate Lion speech you’d need to train a transformer on a parallel corpus of Lion to English, the existence of which would require that you already understand Lion.
cdrini · 46m ago
Hmm I don't think we'd need a rosetta stone. In the same way LLMs associate via purely contextual usage the meaning of words, two separate data sets of lion and English, encoded into the same vector space, might pick up patterns of contextual usage at a high enough level to allow for mapping between the two languages.
For example, given thousands of English sentences with the word "sun", the vector embedding encodes the meaning. Assuming the lion word for "sun" is used in much the same context (near lion words for "hot", "heat", etc), it would likely end up in a similar spot near the English word for sun. And because of our shared context living in earth/being animals, I reckon many words likely will be used in similar contexts.
That's my guess though, note I don't know a ton about the internals of LLMs.
zos_kia · 34m ago
Someone more knowledgeable might chime in, but I don't think two corpuses can be mapped to the same vector space. Wouldn't each vector space be derived from its corpus?
godelski · 22m ago
It depends how you define the vector space but I'm inclined to agree.
The reason I think this is from evidence in human language. Spend time with any translator and they'll tell you that some things just don't really translate. The main concepts might, but there's subtleties and nuances that really change the feel. You probably notice this with friends who have a different native language than you.
Even same language same language communication is noisy. You even misunderstand your friends and partners, right? The people who have the greatest chance of understanding you. It's because the words you say don't convey all the things in your head. It's heavily compressed. Then the listener has to decompress from those lossy words. I mean you can go to any Internet forum and see this in action. That there's more than one way to interpret anything. Seems most internet fights start this way. So it's good to remember that there isn't an objective communication. We improperly encode as well as improperly decode. It's on us to try to find out what the speaker means, which may be very different from the words they say (take any story or song to see the more extreme versions of this. This feature is heavily used in art)
Really, that comes down to the idea of universal language[0]. I'm not a linguist (I'm an AI researcher), but my understanding is most people don't believe it exists and I buy the arguments. Hard to decouple due to shared origins and experiences.
Hmm I don't think a universal language is implied by being able to translate without a rosetta stone. I agree, I don't think there is such a thing as a universal language, per se, but I do wonder if there is a notion of a universal language at a certain level of abstraction.
But I think those ambiguous cases can still be understood/defined. You can describe how this one word in lion doesn't neatly map to a single word in English, and is used like a few different ways. Some of which we might not have a word for in English, in which case we would likely adopt the lion word.
Although note I do think I was wrong about embedding a multilingual corpus into a single space. The example I was thinking of was word2vec, and that appears to only work with one language. Although I did find some papers showing that you can unsupervised align between the two spaces, but don't know how successful that is, or how that would treat these ambiguous cases.
cdrini · 18m ago
That's a very good point! I hadn't thought of that. And that makes sense, since the encoding of the word "sun" arises from its linguistic context, and there's no such shared context between the English word sun and any lion word in this imaginary multilingual corpus, so I don't think they'd go to the same point.
Apparently one thing you could do is train a word2vec on each corpus and then align them based on proximity/distances. Apparently this is called "unsupervised" alignment and there's a tool by Facebook called MUSE to do it. (TIL, Thanks ChatGPT!) https://github.com/facebookresearch/MUSE?tab=readme-ov-file
Although I wonder if there are better embedding approaches now as well. Word2Vec is what I've played around with from a few years ago, I'm sure it's ancient now!
klank · 1h ago
And even, assuming the existence of a Lion to English corpus, it would only give us Human word approximations. We experience how lossy that type of translation is already between Human->Human languages. Or sometimes between dialects within the same language.
Who knows, we don't really have good insight into how this information loss, or disparity grows. Is it linear? exponential? Presumably there is a threshold beyond which we simply have no ability to translate while retaining a meaningful amount of original meaning.
Would we know it when we tried to go over that threshold?
Sorry, I know I'm rambling. But it has always been regularly on my mind and it's easy for me to get on a roll. All this LLM stuff only kicked it all into overdrive.
cdrini · 1h ago
Hmm I'm not convinced we don't have a lot of shared experience. We live on the same planet. We both hunger, eat, and drink. We see the sun, the grass, the sky. We both have muscles that stretch and compress. We both sleep and yawn.
I mean who knows, maybe their perception of these shared experiences would be different enough to make communication difficult, but still, I think it's undeniably shared experience.
klank · 1h ago
That's fair. To me, the point of Wittgenstein's lion thought experiment though was not necessarily to say that _any_ communication would be impossible. But to understand what it truly meant to be a lion, not just what it meant to be an animal. But we have no shared lion experiences nor does a lion have human experiences. So would we be able to have a human to lion communication even if we could both speak human speech?
I think that's the core question being asked and that's the one I have a hard time seeing how it'd work.
cdrini · 57m ago
Hmm, I'm finding the premise a bit confusing, "understand what it truly meant to be a lion". I think that's quite different than having meaningful communication. One could make the same argument for "truly understanding" what it means to be someone else.
My thinking is that if something is capable of human-style speech, then we'd be able to communicate with them. We'd be able to talk about our shared experiences of the planet, and, if we're capable of human-style speech, likely also talk about more abstract concepts of what it means to be a human or lion. And potentially create new words for concepts that don't exist in each language.
I think the fact that human speech is capable of abstract concepts, not just concrete concepts, means that shared experience isn't necessary to have meaningful communication? It's a bit handwavy, depends a bit on how we're defining "understand" and "communicate".
klank · 49m ago
> I think the fact that human speech is capable of abstract concepts, not just concrete concepts, means that shared experience isn't necessary to have meaningful communication?
I don't follow that line of reasoning. To me, in that example, you're still communicating with a human, who regardless of culture, or geographic location, still shares an immense amount of shared life experiences with you.
Or, they're not. For example, an intentionally extreme example, I bet we'd have a super hard time talking about homotopy type theory with a member of the amazon rain forest. Similarly, I'd bet they had their own abstract concepts that they would not be able to easily explain to us.
cdrini · 33m ago
I would say there's a difference between abstract and complex. A complex topic would take a lot of time to communicate mainly because you have to go through all the prerequisites. By abstract I mean something like "communicate" or "loss" or "zero"! The primitives of complex thought.
And if we're saying the lion can speak human, then I think it follows that they're capable of this abstract thought, which is what I think is making the premise confusing for me. Maybe if I change my thinking and let's just say the lion is speaking... But if they're speaking a "language" that's capable of communicating concrete and abstract concepts, then that's a human-style language! And because we share many concrete concepts in our shared life experience, I think we would be able to communicate concrete concepts, and then use those as proxies to communicate abstract concepts and hence all concepts?
kouru225 · 1h ago
Knowing lions I bet all they’d talk about is being straight up dicks to anyone and everyone around them so yea I think we probably could ngl
UltraSane · 1h ago
We should understand common concepts like hungry, tired, horny, pain, etc.
groby_b · 20m ago
That's not necessarily what matters.
What matters is if there is a shared representation space across languages. If there is, you can then (theoretically, there might be a PhDs and a Nobel or two to be had :) separate underlying structure and the translation from underlying structure to language.
The latter - what they call the universal embedding inverter - is likely much more easily trainable. There's a good chance that certain structures are unique enough you can map them to underlying representation, and then lever that. But even if that's not viable, you can certainly run unsupervised training on raw material, and see if that same underlying "universal" structure pops out.
There's a lot of hope and conjecture in that last paragraph, but the whole point of the article is that maybe, just maybe, you don't need context to translate.
stillpointlab · 26m ago
I have to be careful of confirmation bias when I read stuff like this because I have the intuition that we are uncovering a single intelligence with each of the different LLMs. I even feel, when switching between the big three (OpenAI, Google, Anthropic) that there is a lot of similarity in how they speak and think - but I am aware of my bias so I try not to let it cloud my judgement.
On the topic of compression, I am reminded of an anecdote about Heidegger. Apparently he had a bias towards German and Greek, claiming that these languages were the only suitable forms for philosophy. His claim was based on the "puns" in language, or homonyms. He had some intuition that deep truths about reality were hidden in these over-loaded words, and that the particular puns in German and Greek were essential to understand the most fundamental philosophical ideas. This feels similar to the idea of shared embeddings being a critical aspect of LLM emergent intelligence.
This "superposition" of meaning in representation space again aligns with my intuitions. I'm glad there are people seriously studying this.
bbarnett · 10m ago
LLMs don't think, nor are they intelligent or exhibiting intelligence.
Language does have constraints, yet it evolves via its users to encompass new meanings.
Thus those constraints are artificial, unless you artificially enforce static language use. And of course, for an LLM to use those new concepts, it needs to be retokenized by being trained on new data.
For example, if we trained LLMs only on books, encyclopedias, newpapers, and personal letters from 1850, it would have zero capacity to speak comprehensibly or even seem cogent on much of the modern world.
And it would forever remain in that disconnected positon.
LLMs do not think, understand anything, nor learn. Should you wish to call tokenization, learning, then you'd better call a clock "learning" from the gears and cogs that enable its function.
LLMs do not think, learn, or exhibit intelligence. (I feel this is not said enough).
We will never, ever get AGI from an LLM. Ever.
I am sympathetic to the wonder of LLMs. To seeing them as such. But I see some art as wonderous too. Some machinery is beautiful in execution and to use.
But that doesn't change truths.
TheSaifurRahman · 3h ago
This only works when different sources share similar feature distributions and semantic relationships.
The M or B game breaks down when you play with someone who knows obscure people you've never heard of. Either you can't recognize their references, or your sense of "semantic distance" differs from theirs.
The solution is to match knowledge levels: experts play with experts, generalists with generalists.
The same applies to decoding ancient texts, if ancient civilizations focused on completely different concepts than we do today, our modern semantic models won't help us understand their writing.
npinsker · 3h ago
I've played this game with friends occasionally and -- when it's a person -- don't think I've ever completed a game.
streptomycin · 2h ago
Is it closer to Mussolini or bread? Mussolini.
Is it closer to Mussolini or David Beckham? Uhh, I guess Mussolini. (Ok, they’re definitely thinking of a person.)
That reasoning doesn't follow. Many things besides people would have the same answers, for instance any animal that seems more like Mussolini than Beckham.
pjio · 2h ago
I believe the joke is about David Beckham not really being (perceived as) human, even when compared to personified evil
jxmorris12 · 2h ago
Whoops. I hope you can overlook this minor logical error.
streptomycin · 1h ago
Oh yeah it's absolutely an interesting article!
Fomite · 2h ago
Oswald Mosley
kindkang2024 · 52m ago
The Dao can be spoken of, yet what is spoken is not the eternal Dao.
So, what is the Dao? Personally, I see it as will — something we humans could express through words. For any given will, even though we use different words in different languages — Chinese, Japanese, English — these are simply different representations of the same will.
Large language models learn from word tokens and begin to grasp these wills — and in doing so, they become the Dao.
In that sense, I agree: “All AI models might be the same.”
somethingsome · 2h ago
Mmmh I'm deeply skeptical of some parts.
> One explanation for why this game works is that there is only one way in which things are related
There is not, this is a completely non transitive relationship.
On another point, suppose you keep the same vocabulary, but permute the signification of the words, the neural network will still learn relationships, completely different ones, but it's representation may converge toward a better compression for that set of words, but I'm dubious that this new compression scheme will ressemble the previous one (?)
I would say that given an optimal encoding of the relationships, we can achieve an extreme compression, but not all encodings lead to the same compression at the end.
If I add 'bla' between every words in a text, that is easy to compress, but now, if I add an increasing sequence of words between each words, the meaning is still there, but the compression will not be the same, as the network will try to generate the words in-between.
(thinking out loud)
ieie3366 · 1h ago
LLMs are bruteforce reverse engineered human brains. Think about it. Any written text out there is written by human brains. The ”function” to output this is whatever happens inside the brain, insanely complex.
LLM ”training” is just brute forcing the same function into existence. ”Human brain output X, llm output Y, mutate it times billion until X and Y start matching”
tgsovlerkhgsel · 1h ago
I've noticed that many of the large, separately developed AIs often answer with remarkably similar wording to the same question.
empath75 · 3h ago
This is kind of fascinating because I just tried to play mussolini or bread with chatgpt and it is absolutely _awful_ at it, even with reasoning models.
It just assumes that your answers are going to be reasonably bread-like or reasonably mussolini-like, and doesn't think laterally at all.
It just kept asking me about varieties of baked goods.
edit: It did much better after I added some extra explanation -- that it could be anything that it may be very unlike either choice, and not to try and narrow down too quickly
fsmv · 3h ago
I think an LLM is a bit too high level for this game or maybe it just would need a lengthy prompt to explain the game.
If you used word2vec directly it's the exact right thing to play this game with. Those embeddings exist in an LLM but it's trained to respond like text found online not play this game.
TheSaifurRahman · 3h ago
Has there been research on using this to make models smaller? If models converge on similar representations, we should be able to build more efficient architectures around those core features.
yorwba · 3h ago
It's more likely that such an architecture would be bigger rather than smaller. https://arxiv.org/abs/2412.20292 demonstrated that score-matching diffusion models approximate a process that combines patches from different training images. To build a model that makes use of this fact, all you need to do is look up the right patch in the training data. Of course a model the size of its training data would typically be rather unwieldy to use. If you want something smaller, we're back to approximations created by training the old-fashioned way.
samsartor · 43m ago
I have mixed feelings about this interpretation: that diffusion models approximately produce moseics from patches of training data. It does a good job helping people understand why diffusion models are able to work. I used it myself in talk almost 3 years ago! And it isn't a lie exactly, the linked paper is totally sound. It's just that it only works if you assume your model is an absolute optimal minimization of the loss (under some inductive biases). It isn't. No machine learning more complicated than OLS holds up to that standard.
_And that's the actual reason they work._ Undefit models don't just approximate, they interpolate, extrapolate, generalize a bit, and ideally smooth out the occasional total garbage mixed in with your data. In fact, diffusion models work so well because they can correct their own garbage! If extra fingers start to show up in step 5, then steps 6 and 7 still have a chance to reinterpret that as noise and correct back into distribution.
And then there's all the stuff you can do with diffusion models. In my research I hack into the model and use it to decompose images into the surface material properties and lighting! That doesn't make much sense as averaging of memorized patches.
Given all that, it is a very useful interpretation. But I wouldn't take it too literally.
giancarlostoro · 3h ago
I've been thinking about this a lot. I want to know what's the smallest a model needs to be, before letting it browse search engines, or files you host locally is actually an avenue an LLM can go through to give you more informed answers. Is it 2GB? 8GB? Would love to know.
dr_dshiv · 2h ago
What about the platonic bits? Any other articles that give more details there?
foxes · 50m ago
So in the limit the models representation space has one dimension per "concept" or something, but making it couple things together is what actually makes it useful?
An infinite dimensional model with just one dim per concept would be sorta useless, but you need things tied together?
tyronehed · 4h ago
Especially if they are all me-too copies of a Transformer.
When we arrive at AGI, you can be certain it will not contain a Transformer.
jxmorris12 · 3h ago
I don't think architecture matters. It seems to be more a function of the data somehow.
> I don't think architecture matters. It seems to be more a function of the data somehow.
of course it matters
if I supply the ants in my garden with instructions on how to build tanks and stealth bombers they're still not going to be able to conquer my front room
Xcelerate · 3h ago
Edit: I wrote my comment a bit too early before finishing the whole article. I'll leave my comment below, but it's actually not very closely related to the topic at hand or the author's paper.
I agree with the gist of the article (which IMO is basically that universal computation is universal regardless of how you perform it), but there are two big issues that prevent this observation from helping us in a practical sense:
1. Not all models are equally efficient. We already have many methods to perform universal search (e.g., Levin's, Hutter's, and Schmidhuber's versions), but they are painfully slow despite being optimal in a narrow sense that doesn't extrapolate well to real world performance.
2. Solomonoff induction is only optimal for infinite data (i.e., it can be used to create a predictor that asymptotically dominates any other algorithmic predictor). As far as I can tell, the problem remains totally unsolved for finite data, due to the additive constant that results from the question: which universal model of computation should be applied to finite data? You can easily construct a Turing machine that is universal and perfectly reproduces the training data, yet nevertheless dramatically fails to generalize. No one has made a strong case for any specific natural prior over universal Turing machines (and if you try to define some measure to quantify the "size" of a Turing machine you realize this method starts to fail once the number of transition tables becomes large enough to start exhibiting redundancy).
im3w1l · 2h ago
Regarding your second point I think there are two cases here that should be kept separate. The first is that you are teleported into a parallel dimension where literally everything works differently from here. In that case I do agree that there are several reasonable choices of models of computation. You simply have to pick one and hope it wasn't too bad.
But the second case is that you encounter some phenomenon here in our ordinary world. And in that case I think you can do way better by reasoning about the phenomenon and trying to guess at plausible mechanics based on your preexisting knowledge of how the world works. In particular, I think guessing that "there is some short natural language description of how the phenomenon works, based on a language grounded in the corpus of human writing" is a very reasonable prior.
gerdesj · 1h ago
The devil is in the details.
I recently gave the "Veeam Intelligence" a spin.
Veeam is a backup system spanning quite a lot of IT systems with a lot of options - it is quite complicated but it is also a bounded domain - the app does as the app does. It is very mature and has extremely good technical documentation and a massive amount of technical information docs (TIDs) and a vibrant and very well informed set of web forums, staffed by ... staff and even the likes of Anton Gostev - https://www.veeam.com/company/management-team.html
Surely they have close to the perfect data set to train on?
I asked a question about moving existing VMware replicas from one datastore to another and how to keep my replication jobs working correctly. In this field, you may not be familiar with my particular requirements but this is not a niche issue.
The "VI" came up with a reasonable sounding answer involving a wizard. I hunted around the GUI looking for it (I had actually used that wizard a while back). So I asked where it was and was given directions. It wasn't there. The wizard was genuine but its usage here was a hallucination.
A human might have done the same thing with some half remembered knowledge but would soon fix that with the docs or the app itself.
I will stick to reading the docs. They are really well written and I am reasonably proficient in this field so actually - a decent index is all I need to get a job done. I might get some of my staff to play with this thing when given a few tasks that they are unfamiliar with and see what it comes up with.
I am sure that domain specific LLMs are where it is at but we need some sort of efficient "fact checker" system.
There isn't anything core to reality about Kentucky, its Derby, the Gregorian calendar, America, horse breeds, etc. These are all cultural inventions that happen to have particular importance in global human culture because of accidents of history, and are well-attested in training sets. At best we are seeing some statistical convergence on training sets because everyone is training on the same pile and scraping the barrel for any differences.
I'd say our current largest LLMs probably contain sufficient detail to explain a concept like a named race horse starting from QCD+gravity and ending up at cultural human events, given a foothold of some common ground to translate into a new unknown language. In a sense, that's what a model of reality is. I think it's possible because LLMs figure out translation between human languages by default with enough pretraining.
There is billions of human-written texts, grounded in shared experience that makes our AI good at language. We don't have that for a whale.
Ps. I am excited about Google’s Gemma dolphin project (https://blog.google/technology/ai/dolphingemma/), but I would prefer if they chose elephants instead of dolphins as the subject, since we live on land, not in water. This way, more emphasis could be placed on core research, and immediate communication feedback would be possible.
I don't care that it might be better, and I shall hold you personally responsible should this come to pass.
I train my cat and while I can't always understand her I think one of the most impressive features of the human mind is to be able to have such great understanding of others. We have theory of mind, joint attention, triadic awareness, and much more. My cat can understand me a bit but it's definitely asymmetric.
It's definitely not easy to understand other animals. As Wittgenstein suggests, their minds are alien to us. But we seem to be able to adapt. I'm much better at understanding my cat than my girlfriend (all the local street cats love me, and I teach many of them tricks) but I'm also nothing compared to experts I've seen.
Honestly, I think everyone studying AI could benefit by spending some more time studying animal cognition. While not like computer minds these are testable "alien minds" and can help us better understand the general nature of intelligence
Would you care to expound?
Thus, to your point, assuming communication, because "there's nothing really special about speech", does that mean we would be able to understand a lion, if the lion could speak? Wittgenstein would say probably not. At least not initially and not until we had built shared lived experiences.
Which isn’t saying much, it still couldn’t explain Lion Language to us, it could just generate statistically plausible examples or recognize examples.
To translate Lion speech you’d need to train a transformer on a parallel corpus of Lion to English, the existence of which would require that you already understand Lion.
For example, given thousands of English sentences with the word "sun", the vector embedding encodes the meaning. Assuming the lion word for "sun" is used in much the same context (near lion words for "hot", "heat", etc), it would likely end up in a similar spot near the English word for sun. And because of our shared context living in earth/being animals, I reckon many words likely will be used in similar contexts.
That's my guess though, note I don't know a ton about the internals of LLMs.
The reason I think this is from evidence in human language. Spend time with any translator and they'll tell you that some things just don't really translate. The main concepts might, but there's subtleties and nuances that really change the feel. You probably notice this with friends who have a different native language than you.
Even same language same language communication is noisy. You even misunderstand your friends and partners, right? The people who have the greatest chance of understanding you. It's because the words you say don't convey all the things in your head. It's heavily compressed. Then the listener has to decompress from those lossy words. I mean you can go to any Internet forum and see this in action. That there's more than one way to interpret anything. Seems most internet fights start this way. So it's good to remember that there isn't an objective communication. We improperly encode as well as improperly decode. It's on us to try to find out what the speaker means, which may be very different from the words they say (take any story or song to see the more extreme versions of this. This feature is heavily used in art)
Really, that comes down to the idea of universal language[0]. I'm not a linguist (I'm an AI researcher), but my understanding is most people don't believe it exists and I buy the arguments. Hard to decouple due to shared origins and experiences.
[0] https://en.wikipedia.org/wiki/Universal_language
But I think those ambiguous cases can still be understood/defined. You can describe how this one word in lion doesn't neatly map to a single word in English, and is used like a few different ways. Some of which we might not have a word for in English, in which case we would likely adopt the lion word.
Although note I do think I was wrong about embedding a multilingual corpus into a single space. The example I was thinking of was word2vec, and that appears to only work with one language. Although I did find some papers showing that you can unsupervised align between the two spaces, but don't know how successful that is, or how that would treat these ambiguous cases.
Apparently one thing you could do is train a word2vec on each corpus and then align them based on proximity/distances. Apparently this is called "unsupervised" alignment and there's a tool by Facebook called MUSE to do it. (TIL, Thanks ChatGPT!) https://github.com/facebookresearch/MUSE?tab=readme-ov-file
Although I wonder if there are better embedding approaches now as well. Word2Vec is what I've played around with from a few years ago, I'm sure it's ancient now!
Who knows, we don't really have good insight into how this information loss, or disparity grows. Is it linear? exponential? Presumably there is a threshold beyond which we simply have no ability to translate while retaining a meaningful amount of original meaning.
Would we know it when we tried to go over that threshold?
Sorry, I know I'm rambling. But it has always been regularly on my mind and it's easy for me to get on a roll. All this LLM stuff only kicked it all into overdrive.
I mean who knows, maybe their perception of these shared experiences would be different enough to make communication difficult, but still, I think it's undeniably shared experience.
I think that's the core question being asked and that's the one I have a hard time seeing how it'd work.
My thinking is that if something is capable of human-style speech, then we'd be able to communicate with them. We'd be able to talk about our shared experiences of the planet, and, if we're capable of human-style speech, likely also talk about more abstract concepts of what it means to be a human or lion. And potentially create new words for concepts that don't exist in each language.
I think the fact that human speech is capable of abstract concepts, not just concrete concepts, means that shared experience isn't necessary to have meaningful communication? It's a bit handwavy, depends a bit on how we're defining "understand" and "communicate".
I don't follow that line of reasoning. To me, in that example, you're still communicating with a human, who regardless of culture, or geographic location, still shares an immense amount of shared life experiences with you.
Or, they're not. For example, an intentionally extreme example, I bet we'd have a super hard time talking about homotopy type theory with a member of the amazon rain forest. Similarly, I'd bet they had their own abstract concepts that they would not be able to easily explain to us.
And if we're saying the lion can speak human, then I think it follows that they're capable of this abstract thought, which is what I think is making the premise confusing for me. Maybe if I change my thinking and let's just say the lion is speaking... But if they're speaking a "language" that's capable of communicating concrete and abstract concepts, then that's a human-style language! And because we share many concrete concepts in our shared life experience, I think we would be able to communicate concrete concepts, and then use those as proxies to communicate abstract concepts and hence all concepts?
What matters is if there is a shared representation space across languages. If there is, you can then (theoretically, there might be a PhDs and a Nobel or two to be had :) separate underlying structure and the translation from underlying structure to language.
The latter - what they call the universal embedding inverter - is likely much more easily trainable. There's a good chance that certain structures are unique enough you can map them to underlying representation, and then lever that. But even if that's not viable, you can certainly run unsupervised training on raw material, and see if that same underlying "universal" structure pops out.
There's a lot of hope and conjecture in that last paragraph, but the whole point of the article is that maybe, just maybe, you don't need context to translate.
On the topic of compression, I am reminded of an anecdote about Heidegger. Apparently he had a bias towards German and Greek, claiming that these languages were the only suitable forms for philosophy. His claim was based on the "puns" in language, or homonyms. He had some intuition that deep truths about reality were hidden in these over-loaded words, and that the particular puns in German and Greek were essential to understand the most fundamental philosophical ideas. This feels similar to the idea of shared embeddings being a critical aspect of LLM emergent intelligence.
This "superposition" of meaning in representation space again aligns with my intuitions. I'm glad there are people seriously studying this.
Language does have constraints, yet it evolves via its users to encompass new meanings.
Thus those constraints are artificial, unless you artificially enforce static language use. And of course, for an LLM to use those new concepts, it needs to be retokenized by being trained on new data.
For example, if we trained LLMs only on books, encyclopedias, newpapers, and personal letters from 1850, it would have zero capacity to speak comprehensibly or even seem cogent on much of the modern world.
And it would forever remain in that disconnected positon.
LLMs do not think, understand anything, nor learn. Should you wish to call tokenization, learning, then you'd better call a clock "learning" from the gears and cogs that enable its function.
LLMs do not think, learn, or exhibit intelligence. (I feel this is not said enough).
We will never, ever get AGI from an LLM. Ever.
I am sympathetic to the wonder of LLMs. To seeing them as such. But I see some art as wonderous too. Some machinery is beautiful in execution and to use.
But that doesn't change truths.
The M or B game breaks down when you play with someone who knows obscure people you've never heard of. Either you can't recognize their references, or your sense of "semantic distance" differs from theirs. The solution is to match knowledge levels: experts play with experts, generalists with generalists.
The same applies to decoding ancient texts, if ancient civilizations focused on completely different concepts than we do today, our modern semantic models won't help us understand their writing.
Is it closer to Mussolini or David Beckham? Uhh, I guess Mussolini. (Ok, they’re definitely thinking of a person.)
That reasoning doesn't follow. Many things besides people would have the same answers, for instance any animal that seems more like Mussolini than Beckham.
So, what is the Dao? Personally, I see it as will — something we humans could express through words. For any given will, even though we use different words in different languages — Chinese, Japanese, English — these are simply different representations of the same will.
Large language models learn from word tokens and begin to grasp these wills — and in doing so, they become the Dao.
In that sense, I agree: “All AI models might be the same.”
> One explanation for why this game works is that there is only one way in which things are related
There is not, this is a completely non transitive relationship.
On another point, suppose you keep the same vocabulary, but permute the signification of the words, the neural network will still learn relationships, completely different ones, but it's representation may converge toward a better compression for that set of words, but I'm dubious that this new compression scheme will ressemble the previous one (?)
I would say that given an optimal encoding of the relationships, we can achieve an extreme compression, but not all encodings lead to the same compression at the end.
If I add 'bla' between every words in a text, that is easy to compress, but now, if I add an increasing sequence of words between each words, the meaning is still there, but the compression will not be the same, as the network will try to generate the words in-between.
(thinking out loud)
LLM ”training” is just brute forcing the same function into existence. ”Human brain output X, llm output Y, mutate it times billion until X and Y start matching”
It just assumes that your answers are going to be reasonably bread-like or reasonably mussolini-like, and doesn't think laterally at all.
It just kept asking me about varieties of baked goods.
edit: It did much better after I added some extra explanation -- that it could be anything that it may be very unlike either choice, and not to try and narrow down too quickly
If you used word2vec directly it's the exact right thing to play this game with. Those embeddings exist in an LLM but it's trained to respond like text found online not play this game.
_And that's the actual reason they work._ Undefit models don't just approximate, they interpolate, extrapolate, generalize a bit, and ideally smooth out the occasional total garbage mixed in with your data. In fact, diffusion models work so well because they can correct their own garbage! If extra fingers start to show up in step 5, then steps 6 and 7 still have a chance to reinterpret that as noise and correct back into distribution.
And then there's all the stuff you can do with diffusion models. In my research I hack into the model and use it to decompose images into the surface material properties and lighting! That doesn't make much sense as averaging of memorized patches.
Given all that, it is a very useful interpretation. But I wouldn't take it too literally.
An infinite dimensional model with just one dim per concept would be sorta useless, but you need things tied together?
When we arrive at AGI, you can be certain it will not contain a Transformer.
I once saw a LessWrong post claiming that the Platonic Representation Hypothesis doesn't hold when you only embed random noise, as opposed to natural images: http://lesswrong.com/posts/Su2pg7iwBM55yjQdt/exploring-the-p...
of course it matters
if I supply the ants in my garden with instructions on how to build tanks and stealth bombers they're still not going to be able to conquer my front room
I agree with the gist of the article (which IMO is basically that universal computation is universal regardless of how you perform it), but there are two big issues that prevent this observation from helping us in a practical sense:
1. Not all models are equally efficient. We already have many methods to perform universal search (e.g., Levin's, Hutter's, and Schmidhuber's versions), but they are painfully slow despite being optimal in a narrow sense that doesn't extrapolate well to real world performance.
2. Solomonoff induction is only optimal for infinite data (i.e., it can be used to create a predictor that asymptotically dominates any other algorithmic predictor). As far as I can tell, the problem remains totally unsolved for finite data, due to the additive constant that results from the question: which universal model of computation should be applied to finite data? You can easily construct a Turing machine that is universal and perfectly reproduces the training data, yet nevertheless dramatically fails to generalize. No one has made a strong case for any specific natural prior over universal Turing machines (and if you try to define some measure to quantify the "size" of a Turing machine you realize this method starts to fail once the number of transition tables becomes large enough to start exhibiting redundancy).
But the second case is that you encounter some phenomenon here in our ordinary world. And in that case I think you can do way better by reasoning about the phenomenon and trying to guess at plausible mechanics based on your preexisting knowledge of how the world works. In particular, I think guessing that "there is some short natural language description of how the phenomenon works, based on a language grounded in the corpus of human writing" is a very reasonable prior.
I recently gave the "Veeam Intelligence" a spin.
Veeam is a backup system spanning quite a lot of IT systems with a lot of options - it is quite complicated but it is also a bounded domain - the app does as the app does. It is very mature and has extremely good technical documentation and a massive amount of technical information docs (TIDs) and a vibrant and very well informed set of web forums, staffed by ... staff and even the likes of Anton Gostev - https://www.veeam.com/company/management-team.html
Surely they have close to the perfect data set to train on?
I asked a question about moving existing VMware replicas from one datastore to another and how to keep my replication jobs working correctly. In this field, you may not be familiar with my particular requirements but this is not a niche issue.
The "VI" came up with a reasonable sounding answer involving a wizard. I hunted around the GUI looking for it (I had actually used that wizard a while back). So I asked where it was and was given directions. It wasn't there. The wizard was genuine but its usage here was a hallucination.
A human might have done the same thing with some half remembered knowledge but would soon fix that with the docs or the app itself.
I will stick to reading the docs. They are really well written and I am reasonably proficient in this field so actually - a decent index is all I need to get a job done. I might get some of my staff to play with this thing when given a few tasks that they are unfamiliar with and see what it comes up with.
I am sure that domain specific LLMs are where it is at but we need some sort of efficient "fact checker" system.