How Common Is Multiple Invention? (construction-physics.com)
27 points by rbanffy 7h ago 20 comments
LLMs and Elixir: Windfall or Deathblow? (zachdaniel.dev)
230 points by uxcolumbo 1d ago 120 comments
From tokens to thoughts: How LLMs and humans trade compression for meaning
108 ggirelli 22 6/5/2025, 7:59:21 AM arxiv.org ↗
They're analyzing input embedding models, not LLMs. I'm not sure how the authors justify making claims about the inner workings of LLMs when they haven't actually computed a forward pass. The EMatrix is not an LLM, its a lookup table.
Just to highlight the ridiculousness of this research, no attention was computed! Not a single dot product between keys and queries. All of their conclusions are drawn from the output of an embedding lookup table.
The figure showing their alignment score correlated with model size is particularly egregious. Model size is meaningless when you never activate any model parameters. If Bert is outperforming Qwen and Gemma something is wrong with your methodology.
They used token embeddings directly and not intermediate representations because the latter depend on the specific sentence that the model is processing. Data on human judgment was however collected without any context surrounding each word, thus using the token embeddings seem to be the most fair comparison.
Otherwise, what sentence(s) would you have used to compute the intermediate representations? And how would you make sure that the results aren't biased by these sentences?
Though it sounds odd there is no problem with it and it would indeed return the model's representation of that single word as seen by the model without any additional context.
And like the other commenter said, you can absolutely feed single tokens through the model. Your point doesn’t make any sense though regardless. How about priming the model with “You’re a helpful assistant” just like everyone else does.
I would expect model size to correlate with alignment score because usually model sizes correlate with hidden dimension. But also opposite can be true - bigger models might shift more basic token classification logic into layers and hence embedding alignment can go down. Regardless feels like pretty useless research…
I have never understood broad statements that models are just (or mostly) statistical tools.
Certainly statistics apply, minimizing mismatches results in mean (or similar measure) target predictions.
But the architecture of a model is the difference between compressed statistics vs. forcing a model to translate information in a highly organized way reflecting the actual shape of the problem to get any accuracy at all.
In both cases, statistics are relevant, but in the latter it's not a particularly insightful way to talk about what a model has learned.
Statistical accuracy, prediction, etc. are basic problems to solve. The training criteria being optimized. But they don't limit the nature of solutions. They both leave problem difficulty, and solution sophistication unbounded.
From what I can tell this is limited in scope to categorizing nouns (robin is a bird).
Words are a tricky thing to handle.
As I see it, "Open your heart", "Open a can" and "Open to new experiences" have very similar meanings for "Open", being essentially "make a container available for external I/O", similar to the definition of an "open system" in thermodynamics. "Open a bank account" is a bit different, as it creates an entity that didn't exist before, but even then the focus is on having something that allows for external I/O - in this case deposits and withdrawals.
Other languages have similar but fundamentally different oddities which do not translate cleanly
Try explaining why tough and rough rhyme but bough doesn't
You know? Language has a ton of idiosyncrasies.
My favorite thing is a "square." I put that name to an enumeration that allows me to compare and contrast things with two different qualities expressed by two extremes.
One such square is "One can (not) do (not do) something." Both "not"'s can be present and absent, just like a truth table.
"One can do something", "one can not do something", "one can do not do something" and, finally, "one can not help but do something."
Why should we use "help but" instead of "do not"?
While this does not preclude one from enumerating possibilities thinking in English, it makes that enumeration harder than it can be in other languages. For example, in Russian the "square" is expressible directly.
Also, "help but" is not shorter than "do not," it is longer. Useful idioms usually expressed in shorter forms, thus, apparently, "one can not help but do something" is considered by Englishmen as not useful.
https://aclanthology.org/2020.blackboxnlp-1.15/
this isn’t talking about that.