Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations (masonyarbrough.com)

I remain skeptical of emergent properties in LLMs in the way that people have used that term. There was a belief 3-4 years ago that if you just make the models big enough, they magically acquire intelligence. But since then, we’ve seen that the models are actually still pretty limited by the training data: like other ML models, they interpolate well between the data they’ve been trained on, but they don’t generalize well beyond it. Also, we have seen models that are 50-100x smaller now exhibit the same “emergent” capabilities that were once thought to require hundreds of billions of parameters. I personally think the emergent properties really belong to the data instead.

andy99 · 32d ago

Yes, deep learning models only interpolate, and essentially represent an effective way of storing data labeling effort. Doesn't mean they're not useful, just not what tech adjacent promoters want people to think.

john-h-k · 32d ago

> Yes, deep learning models only interpolate

What do you mean by this? I don’t think the understanding of LLMs is sufficient to make this claim

albertzeyer · 26d ago

Just by the mathematical definition of interpolation (https://en.wikipedia.org/wiki/Interpolation), any function which approximates the given regression points (training data), and is defined for values in between (unseed data), will interpolate it. Maybe you think about linear interpolation specifically, but there are many types of interpolation, and any mathematical function, any neural network is just another form of interpolation.

Interpolation is also related to extrapolation. In higher dimensional spaces, the distinction is not so clear. In terms of machine learning, you would call this generalization.

The question is more, is it a good type of interpolation/extrapolation/generalization. You measure that on a test set.

And mathematically speaking, your brain also is just doing another type of interpolation/extrapolation.

andy99 · 32d ago

An LLM is a classifier, there is lots of research into how deep learning classifiers work, that I haven't seen contradicted when applied to LLMs.

drdeca · 32d ago

I still think it seems unclear what you mean by “interpolate” in this context? If your NN takes in several numbers and assigns logits to each class based on those numbers, then if you consider the n dimensional space of possible inputs, and if the new input is in the convex hull of the inputs that appear in training samples, then the meaning of “interpolate” is fairly clear.

But when the inputs are sequences of tokens…

Granted, each token gets embedded as some vector, and you can concatenate those vectors to represent the sequence of tokens as one big vector, but, are these vectors for novel strings in the convex hull of such vectors for the strings in the training set?

bunderbunder · 32d ago

The answer is kind of right there in the start of your last sentence. From the transformer model's perspective, the input is just a time series of vectors. It ultimately isn't any different from any other time series of vectors.

Way back in the day when I was working with latent Dirichlet allocation models, I had a minor enlightenment moment when I realized that the models really weren't capturing any semantically meaningful relationships. They were only capturing meaningless statistical correlations to which I would then assign semantic value so effortlessly and automatically that I didn't even realize it was always me doing it, never the model.

I'm pretty sure LLMs exist on that same continuum. And if you travel down it in the other direction, you get to simple truisms such as "correlation does not equal causation."

drdeca · 32d ago

The part about “is it in the convex hull?” was an important part of the question.

It seems to me that if it isn’t in the convex hull, it could be more fitting to describe it as extrapolation, rather than interpolation?

In general, my question does apply to the task of predicting how a time series of vectors continues: Given a dataset of time series, where the dimension of each vector in the series is such and such, the length of each series is yea long, and there are N series in the training set, should we expect series in the test set or validation set to be in the convex hull of the ones in the training set?

I would think that the number of series in the training set, N, while large, might not be all that large compared to the dimensionality of a whole series?

Hm, are there efficient techniques for evaluating whether a high dimensional vector is in the convex hull of a large number of other high dimensional vectors?

bunderbunder · 32d ago

Just shooting from the hip, LLMs operate out on a frontier where the curse of dimensionality removes a large chunk of the practical value from the concept of a convex hull. Especially in a case like this where the vector embedding process places hard limits on the range of possible magnitudes and directions for any single vector.

drdeca · 31d ago

Outside of the context of a convex hull, I don’t know how to make a distinction between interpolation and extrapolation. This is the core of my question.

What precisely is it that you mean when you say that it is interpolating rather than extrapolating? In the only definition that I know, the one based on convex hulls, I believe it would be extrapolating rather than interpolating. But people often say it is interpolating rather than extrapolating, and I don’t know what they mean.

bunderbunder · 29d ago

I doubt they're really thinking about it in a mathematical sense when they say that. I'm guessing, for example, that "extrapolate" is meant in the more colloquial sense, which is maybe closer to "deduce" in practice.

kevinsync · 32d ago

My hot take is that what some people are labeling as "emergent" is actually just "incidental encoding" or "implicit signal" -- latent properties that get embedded just by nature of what's being looked at.

For instance, if you have a massive tome of English text, a rather high percentage of it will be grammatically-correct (or close), syntactic and understandable, because humans who speak good English took the time to write it and wrote it how other humans would expect to read or hear it. This, by its very nature, embeds "English language" knowledge due to sequence, word choice, normally-hard-to-quantify expressions (colloquial or otherwise), etc.

When you consider source data from many modes, there's all kinds of implicit stuff that gets incidentally written.. for instance, real photographs of outer space or deep sea would only show humans in protective gear, not swimming next to the Titanic. Conversely, you won't see polar bears eating at Chipotle, or giant humans standing on top of mountains.

There's a statistical probability of "this showed up enough in the training data to loosely confirm its existence" / "can't say I ever saw that, so let's just synthesize it" aspect of the embeddings that one person could interpret as "emergent intelligence", while another could just-as-convincingly say it's probabilistic output that is mostly in line with what we expect to receive. Train the LLM on absolute nonsense instead and you'll receive exactly that back.

atoav · 32d ago

Emergent as I have known and used it before is when more complex behavior emerges from simple rules.

My goto example for this was Game of Life, where from very simple rules, a very organically behaving (turing complete) system emerges. Now Game of Life is a deterministic system, meaning that the same rules and the same start-configurarion will play out in exactly the same way each time — but given the simplicity of the logic and the rules the resulting complexity is what I'd call emergent.

So maybe this is more about the definition of what we'd call emergent and what not.

As someone who has programmed markov chains where the stochastic interpolation really shines through, transformer-based LLMS definitly show some emergent behavior one wouldn't have immediately suspected just from the rules. Emergent does not mean "conscious" or "self-reflective" or anything like that. But the things a LLM can infer from its training data is already quite impressive.

gond · 32d ago

Interesting. Is there a quantitative threshold to emergence anyone could point at with these smaller models? Tracing the thoughts of a large language model is probably the only way to be sure, or is it?

gond · 32d ago

Disregarding the downvotes, I mean this as a serious question.

From the liked article: “We don’t know an “algorithm” for this, and we can’t even begin to guess the required parameter budget or the training data needed.”

Why not, at least the external ones? The computational resources and the size of the training dataset is quantifiable from an input point of view. What gets used is not, but the input size should.

zmmmmm · 32d ago

This seems superficial and doesn't really get to the heart of the question. To me it's not so much about bits and parameters but a more interesting fundamental question of whether pure language itself is enough to encompass and encode higher level thinking.

Empirically we observe that an LLM trained purely to predict a next token can do things like solve complex logic puzzles that it has never seen before. Skeptics claim that actually the network has seen at least analogous puzzles before and all it is doing is translating between them. However the novelty of what can be solved is very surprising.

Intuitively it makes sense that at some level, that intelligence itself becomes a compression algorithm. For example, you can learn separately how to solve every puzzle ever presented to mankind, but that would take a lot of space. At some point it's more efficient to just learn "intelligence" itself and then apply that to the problem of predicting the next token. Once you do that you can stop trying to store an infinite database of parallel heuristics and just focus the parameter space on learning "common heuristics" that apply broadly across the problem space, and then apply that to every problem.

The question is, at what parameter count and volume of training data does the situation flip to favoring "learning intelligence" rather than storing redundant domain specialised heuristics? And is it really happening? I would have thought just looking at the activation patterns could tell you a lot, because if common activations happen for entirely different problem spaces then you can argue that the network has to be learning common abstractions. If not, maybe it's just doing really large scale redundant storage of heurstics.

disambiguation · 32d ago

Good take, but while we're invoking intuition, something is clearly missing in the fundamental design given real brains don't need to consume all the worlds literature before demonstrating intelligence. There's some missing piece w.r.t self learning and sense making. The path to emergent reasoning you lay out is interesting and might happen anyway as we scale up, but the original idea was to model these algorithms in our own image in the first place - I wonder if we won't discover that missing piece first.

adampwells · 32d ago

> However the novelty of what can be solved is very surprising.

I've read that the 'surprise' factor is much reduced when you actually see just how much data these things are trained on - far more than a human mind can possibly hold and (almost) endlessly varied. I.e. there is 'probably' something in the training set close to what 'surprised' you.

cratermoon · 32d ago

Alternate view: Are Emergent Abilities of Large Language Models a Mirage? https://arxiv.org/abs/2304.15004

"Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance."

foobarqux · 32d ago

The author himself explicitly acknowledges the paper but the incomprehensibly ignores it ("Even so, many would like to understand, predict, and even facilitate the emergence of these capabilities."). It's like saying "some say [foo] doesn't exist but even so many would like to understand [foo]". It's incoherent.

autoexec · 32d ago

No point in letting facts get in the way of an entire article I guess.

K0balt · 32d ago

A decent thought-proxy for this : powered flight.

An aircraft can approach powered flight without achieving it. With a given amount of thrust or aerodynamic characteristics, the aircraft will weigh dynamic_weight=(static_weight - x) where x is a combination of the aerodynamic characteristics and the amount of thrust applied.

In no case where dynamic_weight>0 will the aircraft fly, even though it exhibits characteristics of flight, I.e the transfer of aerodynamic forces to counteract gravity.

So while it progressively exhibits characteristics of flight, it is not capable of any kind of flight at all until the critical point of dynamic_weight<0. So the enabling characteristics are not “emergent”, but the behavior is.

I think this boils down to a matter of semantics.

scopemouthwash · 32d ago

“Thought-proxy”?

I think the word you’re looking for is “analogy”.

K0balt · 32d ago

Analogy is a great word proxy for thought proxy.

Yeah, I’m pretty sure analogy would have been fine there, I think maybe it fell off the edge of my vocabulary for a moment? Not really sure, but I really can’t think of any reason why “thought proxy” would have been more descriptive, informative, or accurate ¯_(ツ)_/¯

jebarker · 32d ago

Yes, this paper is under-appreciated. The point is that we as humans decide what constitutes a given task we're going to set as a bar and it turns out that statistical pattern matching can solve many of those tasks to a reasonable level (we also get to define "reasonable") when there's sufficient scale of parameters and data, but that tip-over point is entirely arbitrary.

Al-Khwarizmi · 32d ago

The continuous metrics the paper uses are largely irrelevant in practice, though. The sudden changes appear when you use metrics people actually care about.

To me the paper is overhyped. Knowing how neural networks work, it's clear that there are going to be underlying properties that vary smoothly. This doesn't preclude the existence of emergent abilities.

moffkalast · 32d ago

That has been a problem with most LLM benchmarks. Any test that's rated in percentages tends to be logarithmic, since getting from say 90% to 95% is not a linear 5% improvement but probably more like a 2x or 10x improvement in practical terms, since the metric is already nearly maxed out and only the extreme edge cases remain that are much harder to master.

lordnacho · 32d ago

What seems a bit miraculous to me is, how did the researchers who put us on this path come to suspect that you could just throw more data and more parameters at the problem? If the emergent behavior doesn't appear for moderate sized models, how do you convince management to let you build a huge model?

TheCoreh · 32d ago

This is perhaps why it took us this long to get to LLMs, the underlying math and ideas were (mostly) there, and even if the Transformer as an architecture wasn't ready yet, it wouldn't surprise me if throwing sufficient data/compute at a worse architecture wouldn't also produce comparable emergent behavior

There needed to be someone willing to try going big at an organization with sufficient idle compute/data just sitting there, not a surprise it first happened at Google.

hibikir · 32d ago

But we got here step by step, as other interesting use cases came up by using somewhat less compute. Image recognition, early forms of image generation, AlphaGo, AlphaZero for chess. All earlier forms of deep neural networks that are much more reasonable than training a top of the line LLM today, but seemed expensive at the time. And ultimately a lot of this also comes from the hardware advancements and the math advancements. If you took classes neural networks in the 1990s, you'd notice that they mostly talked about 1 or 2 hidden layers, and not all that much focus on the math to train large networks, precisely because of how daunting the compute costs were for anything that wasn't a toy. But then came video card hardware, and improvements to use it to do gradient descent, making going past silly 3 layer networks somewhat reasonable.

Every bet makes perfect sense after you consider how promising the previous one looked, and how much cheaper the compute was getting. Imagine being tasked to train an LLM in 1995: All the architectural knowledge we have today and a state-level mandate would not have gotten all that far. Just the amount of fast memory that we put to bear wouldn't have been viable until relatively recently.

pixl97 · 32d ago

> and how much cheaper the compute was getting.

I remember back in the 90s how scientists/data analysts were saying that we'd need exaflop scale systems to tackle certain problems. I remember thinking how foreign that number was when small systems were running maybe tens of megaFLOPS. Now we have systems starting to zettaflops (FP8 so not exact comparison).

educasean · 32d ago

You might appreciate this article: https://www.quantamagazine.org/when-chatgpt-broke-an-entire-...

creer · 32d ago

They didn't. Not LLM people specifically. Google a long time ago figured out that you get far better results on a very wide range of problems just by going bigger. (Which then must have become frustrating for some people because most of the effort seems to have gone to scaling? See for example as-opposed-to-symbolic.)

Al-Khwarizmi · 32d ago

While GPT-2 didn't show emergent abilities, it did show improved accuracy on various tasks with respect to GPT-1. At that point, it was clear that scaling made sense.

In other words, no one expected GPT-3 to suddenly start solving tasks without training as it did, but it was expected to be useful as an incremental improvement to what GPT-2 did. At the time, GPT-2 was seeing practical use, mainly in text generation from some initial words - at that point the big scare was about massive generation of fake news - and also as a model that one could fine-tune for specific tasks. It made sense to train a larger model that would do all that better. The rest is history.

prats226 · 32d ago

I don't think model sizes increased suddenly, there might not be emergent properties for certain tasks at smaller scales but there was improvement at slower rate for sure. Competition to improve that metric albeit at lower pace led to slow increase in model sizes and by chance led to emergence the way its defined in paper?

gessha · 32d ago

There’s that Sinclair quote:

It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It

gyomu · 32d ago

The reasoning in the article is interesting, but this struck me as a weird example to choose:

> “The real question is how can we predict when a new LLM will achieve some new capability X. For example, X = “Write a short story that resonates with the social mood of the present time and is a runaway hit”

Framing a capability as something that is objectively measurable (“able to perform math on the 12th grade level”, “able to write a coherent, novel text without spelling/grammar mistakes”) makes sense within the context of what the author is trying to demonstrate.

But the social proof aspect (“is a runaway hit”) feels orthogonal to it? Things can be runaway hits for social factors independently of the capability they actually represent.

Retric · 32d ago

It’s not about being “a runaway hit” as an objective measurement it’s about the things an LLM would need to achieve before that was possible. At first AI scores on existing tests seemed like a useful metric. However, tests designed for humans make specific assumptions that don’t apply to these systems making such tests useless.

AI is very good at gaming metrics so it’s difficult to list some criteria where achieving it is meaningful. A hypothetical coherent novel without spelling/grammar mistakes could in effect be a copy of some existing work that shows up in its corpus, however a hit requires more than a reskinned story.

chii · 32d ago

> however a hit requires more than a reskinned story.

demonstrably false with a lot of hits in the past that is a reskinned story of existing stories!

Retric · 32d ago

While not a technical term of art, copyright applies to a reskinned story. “the series is not available in English translation, because of the first book having been judged a breach of copyright.” https://en.wikipedia.org/wiki/Tanya_Grotter

There’s plenty of room to take inspiration and go in another direction aka Pride and Prejudice and Zombies.

creer · 32d ago

That it seems hard (impossible) or not clear intuitively how to go about it, to us humans, is what makes the question interesting. In a way. The other questions are interesting but a different class of interesting. At any rate, both good for this question. Either way this becomes "what would we need to estimate this emergence threshold?".

eutropia · 32d ago

I often find that people using the word emergent to describe properties of a system tend to ascribe quasi magical properties to the system. Things tend to get vague and hand wavy when that term comes up.

Just call them properties with unknown provenance.

gond · 32d ago

> Just call them properties with unknown provenance.

They would if it would be the correct designation, however, it is not.

Emergence does not equal non-understanding or some spooky-hooky force coming from the unknown.Reductionism does not lead to an explaining-away of emergence.

juancn · 32d ago

I always wondered if the specific dimensionality of the layers and tensors has a specific effect on the model.

It's hard to explain, but higher dimensional spaces have weird topological properties, not all behave the same way and some things are perfectly doable in one set of dimensions while for others it just plain doesn't work (e.g. applying surgery on to turn a shape into another).

etrautmann · 32d ago

How is topology specifically related to emergent capabilities in AI?

interstice · 32d ago

The bag of heuristics thing is interesting to me, is it not conceivable that a NN of a certain size trained only on math problems would be able to wire up what amounts to a calculator? And if so, could that form part of a wider network, or is I/O from completely different modalities not really possible in this way?

andy99 · 32d ago

I didn't follow entirely on a fast read, but this confused me especially:

  The parameter count of an LLM defines a certain bit budget. This bit budget must be spread across many, many tasks

I'm pretty sure that LLMs, like all big neural networks, are massively under-specified, as in there are way more parameters than data to fit (understanding the training data set is bigger than the size of the model, but the point is the same loss can be achieved with many different combinations of parameters).

And I think of this underspecification as the reason neural networks extrapolate cleanly and this generalize.

vonneumannstan · 32d ago

This doesn't seem right and most people recognize that 'neurons' encode for multiple activations. https://transformer-circuits.pub/2022/toy_model/index.html

Der_Einzige · 32d ago

They’re 1000% right on the idea that most models are hilariously undertrained

vonneumannstan · 32d ago

Pretty sure that since the Chinchilla paper this probably isn't the case. https://arxiv.org/pdf/2203.15556

kazinator · 32d ago

> the same loss can be achieved with many different combinations of parameters

Perhaps it can be, but it isn't. The loss in a given model was achieved with that particular combination of parameters that the model has, and there exists no other combination of parameters which which that model can appeal to for more information.

To have the other combinations, we would need to train more models and then somehow combine them so that the combinations are available; but that's just conceptually the same as making one larger model with more parameters, and lower loss.

Michelangelo11 · 32d ago

How could they not?

Emergent properties are unavoidable for any complex system and probably exponentially scale with complexity or something (I'm sure there's an entire literature about this somewhere).

One good instance are spandrels in evolutionary biology. The wikipedia article is a good explanation of the subject: https://en.m.wikipedia.org/wiki/Spandrel_(biology)

creer · 32d ago

This is also my impression "how could they not" but it goes a bit further: Can we predict it? Can we estimate the size of a system that will achieve X? Can we build systems that are pre-disposed to emergent behaviors? Can we build systems that preclude some class of emergent behavior (relevant to AGI safety perhaps)? And then of course many systems will not achieve anything because even when "large", they are "uselessly large" - as in, you can define more points on a line and it's still a dumb line.

To me the "how could they not" comes from the idea that if LLMs somehow encapsulate/ contain/ exploit all human writings, then they most likely cover a large part of human intelligence. For damn sure much more than the "basic human". The question is more of how we can get this behavior back out of it - than whether it's there.

trash_cat · 30d ago

I think the better question is to answer why do emergent properties exist in the first place.

I disagree with the premise that emergence is binary. It's not. What we determine "emergent behaviour" is partly a social concept. We decide when an LLM is good enough for us and when it "solved" something through emergent properties.

waynecochran · 32d ago

Since gradient descent converges on a local minima, would we expect different emergent properties with different initialization of the weights?

jebarker · 32d ago

Not significantly, as I understand it. There's certainly variation in LLM abilities with different initializations but the volume and content of the data is a far bigger determinant of what an LLM will learn.

waynecochran · 32d ago

So there is an "attractor" that different initializations end up converging on?

andy99 · 32d ago

Different initialization converge to different places, e.g https://arxiv.org/abs/1912.02757

For LLMs (as with other models), many local optima appear to support roughly the same behavior. This is the idea of the problem being under-specified ie many more equations than unknowns so there are many ways to get the same result.

recursivecaveat · 32d ago

You end up with different weights when using different random initialization, but with modern techniques the behavior of the resulting model is not really distinct. Back in the image-recognition days it was like +/- 0.5% accuracy. If you imagine you're descending in a billion-parameter space, you will always have a negative gradient to follow in some dimension: local minima frequency goes down rapidly with (independent) dimension count.

me3meme · 32d ago

Metaphor: finding a path from a initial point to a destination in a graph. As the number of parameters increases one can expect the LLM to be able to remember how to go from one place to another and in the end it should be able to find a long path. This can be an emergent property since with less parameters the LLM could not be able to find the correct path. Now one has to find what kind of problems this metaphor is a good model of.

No comments yet

nthingtohide · 32d ago

What do you think about this analogy?

A simple process produces a Mandelbrot set. A simple process (loss minimization through gradient descent) produces LLMs. So what plays the role of 2D-plane or dense point grid in the case of LLMs? It is the embeddings, (or ordered combinations of embeddings ) which are generated after pre-training. In case of a 2D plan, the closeness between two points is determined by our numerical representation schema. But in case of embeddings, we learn the 2D-grid of words (playing the role of points) by looking at how the words are getting used in corpus

The following is a quote from Yuri Manin, an eminent Mathematician.

https://www.youtube.com/watch?v=BNzZt0QHj9U Of the properties of mathematics, as a language, the most peculiar one is that by playing formal games with an input mathematical text, one can get an output text which seemingly carries new knowledge. The basic examples are furnished by scientific or technological calculations: general laws plus initial conditions produce predictions, often only after time-consuming and computer-aided work. One can say that the input contains an implicit knowledge which is thereby made explicit.

I have a related idea which I picked up from somewhere which mirrors the above observation.

When we see beautiful fractals generated by simple equations and iterative processes, we give importance to only the equations, not to the cartesian grid on which that process operates.

pixl97 · 32d ago

> the most peculiar one is that by playing formal games with an input mathematical text, one can get an output text which seemingly carries new knowledge.

Or biologically, DNA/RNA behaves in a similar manner.

samirillian · 32d ago

*Do

OtherShrezzing · 32d ago

It feels like this can be tracked with addition. Humans expect “can do addition” is a binary skill, because humans either can or cannot add.

LLMs approximate addition. For a long time they would produce hot garbage. Then after a lot of training, they could sum 2 digit numbers correctly.

At this point we’d say “they can do addition”, and the property has emerged. They have passed a binary skill threshold.

skydhash · 32d ago

Or you could cobble up a small electronic circuit or a mechanical apparatus and have something that can add numbers.

lgas · 32d ago

Sure, but then what you're doing would be irrelevant to this discussion.

unsupp0rted · 32d ago

Isn't "emergent properties" another way to say "we're not very good at understanding the capabilities of complex systems"?

bunderbunder · 32d ago

I've always understood it more to mean, "phenomena that happen due to the interactions of a system's parts without being explicitly encoded into their individual behavior." Fractal patterns in nature are a great example of emergent phenomena. A single water molecule contains no explicit plan for how to get together with its buddies and make spiky hexagon shapes when they get cold.

And I've always understood talking about emergence as if it were some sort of quasi-magical and unprecedented new feature of LLMs to mean, "I don't have a deep understanding of how machine learning works." Emergent behavior is the entire point of artificial neural networks, from the latest SOTA foundation model all the way back to the very first tiny little multilayer perceptron.

andy99 · 32d ago

Emergence in the context of LLMs is really just us learning that "hey, you don't actually need intelligence to do <task>, turns out it can be done using a good enough next token predictor. We're basically learning what intelligence isn't as we see some of the things these models can do.

I always understood this to be the initial framing, e.g. in the Language Models are Few Shot Learners paper but then it got flipped around.

dmd · 32d ago

Or maybe you need intelligence to be a good enough next token predictor. Maybe the thing that “just” predicts the next token can be called “intelligence”.

bunderbunder · 32d ago

Maybe?

Mostly I just think that "Intelligence" and "AI" go together like "life, the universe and everything" and "42".

Workaccount2 · 32d ago

The challenge there would be showing that humans have this thing called intelligence. You yourself are just outputting ephemeral actions that rise out of your subconscious. We have no idea what that system feeding our output looks like (except it's some kind of organic neural net) and hence there isn't really a basis for discriminating what is and isn't intelligent besides "if it solves problems, it has some degree of intelligence"

PaulDavisThe1st · 32d ago

To return an old but still good analogy ...

If you want to understand how birds fly, the fact that planes also fly is near useless. While a few common aerodynamic principles apply, both types of flight are so different from each other that you do not learn very much about one from the other.

On the other hand, if your goal is just "humans moving through the air for extended distances", it doesn't matter at all that airplanes do not fly the way birds do.

And then, on the generated third hand, if you need the kind of tight quarters maneuverability that birds can do in forests and other tangled spaces, then the way our current airplanes fly is of little to no use at all, and you're going to need a very different sort of technology than the one used in current aircraft.

And on the accidentally generated fourth hand, if your goal is "moving very large mass over very long distance", the the mechanisms of bird flight are likely to be of little utility.

The fact that two different systems can be described in a similar way (e.g. "flying") doesn't by itself tell you that they are working in remotely the same way or capable of the same sorts of things.

pixl97 · 32d ago

doesn't by itself tell you that they are working in remotely the same way or capable of the same sorts of things.

I believe any intelligence that reaches 'human level' should be capable of nearly the same things with tool use, the fact it accomplishes the goal in a different way doesn't matter because the systems behavior is generalized. Hence the term (artificial) general intelligence. Two different general intelligences built on different architectures should be able to converge on similar solutions (for example solutions based on lowest energy states) because they are operating in the same physical realm.

An AGI and an HGI should be able to have convergent solutions for fast air travel, ornithopters, and drones.

PaulDavisThe1st · 32d ago

There is no "human level" because we don't even understand what we mean by "human level". We don't know what metrics to use, we don't even know what to measure.

> Two different general intelligences built on different architectures should be able to converge on similar solutions (for example solutions based on lowest energy states) because they are operating in the same physical realm.

Lots of things connected to intelligence do not operate (much) in any physical realm.

Also, you've really missed the point of the analogy. It's not a question of whether AGI would pick the same solution for fast air travel as HGI. It is that there are least two solutions to the challenge of moving things through the air in a controlled way, and they don't really work in the same way at all. Consequently, we should be ready for the possibility that there is more than one way to do the things LLMs (and to some degree) humans do with text/language, and that they may not be related to each very much. This is a counter to the claim that "since LLMs get so close to human language behavior, it seems quite likely human language behavior arises from a system like an LLM".

Workaccount2 · 32d ago

I think that many birds gets too sensitive when discussing what "flight" means, heh

andy99 · 32d ago

A better bird analogy would be if we didn't understand at all how flight worked, and then started throwing rocks and had pseudo-intellectuals saying "how do we know that isn't all that flight is, we've clearly invented artificial flight".

Jensson · 32d ago

> "we've clearly invented artificial flight"

Scaling laws shows that the harder we throw the rock the further we fly, we just have to throw them hard enough and we have invented flying rocks!

And for the naysayers out there, lemme throw this rock at your head and then tell me it isn't real!

unsupp0rted · 32d ago

If your goal is to get to stable orbit, even never having learned to fly, then the brute force approach works too

prats226 · 32d ago

If we use some metric as proxy for intelligence, emergence simply means a non-linear sudden change in that metric?

HPsquared · 32d ago

Or more generally "fitting a model to data".

esafak · 32d ago

Not quite. Complex systems can exhibit macroscopic properties not evident at microscopic scales. For example, birds self organize into flocks, an emergent phenomenon, visible to the untrained eye. Our understanding of how it happens does not change the fact that it does.

There is a field of study for this called statistical mechanics.

https://ganguli-gang.stanford.edu/pdf/20.StatMechDeep.pdf

HPsquared · 32d ago

Very interesting crossover!

anonymars · 32d ago

Show HN: Chili3d – A open-source, browser-based 3D CAD application

Show HN: PyDoll – Async Python scraping engine with native CAPTCHA bypass (github.com)

Show HN: High End Color Quantizer (github.com)

Show HN: Munal OS: a graphical experimental OS with WASM sandboxing (github.com)

Show HN: Most users won't report bugs unless you make it stupidly easy

Show HN: An open-source rhythm dungeon crawler in 16 x 9 pixels (github.com)

Show HN: A MCP server and client implementing the latest spec (github.com)

Show HN: Viberunner – build personal desktop apps in seconds (viberunner.me)

Show HN: Glowstick – type level tensor shapes in stable rust (github.com)

Show HN: macOS app PhotoSort could help reduce your monthly iCloud bill

Show HN: Somo – a human friendly alternative to netstat (github.com)

Show HN: I am making an app to rival "Everything" (drimiteros.github.io)

Show HN: Let’s Bend – Open-Source Harmonica Bending Trainer (letsbend.de)

Show HN: Air Lab – A portable and open air quality measuring device (networkedartifacts.com)

Show HN: Zymo.tv (github.com)

Show HN: I made CSS-only glitch effect (muffinman.io)

Show HN: Pyleak – Detect asyncio issues causing AI agent latency (github.com)

Show HN: AI-Powered Music Creation Starts Here – Vibe Musicing (vibemusicing.com)

Show HN: An AI tutors that teaches AI (vivaverbalis.com)

Show HN: Altstack.jp, like European-alternatives.eu, for Japan (altstack.jp)

Show HN: I made an open-source alternative to LangChain (itz.am)

Show HN: I made a Markdown summariser for obsidian (and Markdown in general) (github.com)

Show HN: I made a mobile app that turns your step count into a race (stepracers.com)

Show HN: RenderDay: A GPU-only render farm for Blender (renderday.com)

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations (masonyarbrough.com)

Show HN: GPT image editing, but for 3D models (adamcad.com)

Show HN: I made a 3D networked open world automation game with no game engine (store.steampowered.com)

Show HN: AI game animation sprite generator (godmodeai.cloud)

Show HN: Wrist/off – launch button for your Apple Watch (wristoff.com)

Show HN: Open source API for meeting transcripts and recordings (github.com)

Show HN: Mcp-hacker-news – An MCP server for accessing Hacker News data via LLMs (github.com)

Show HN: Interactive Enigma Machine Simulator (enigmasimulator.com)

Show HN: Lambduck, a Functional Programming Brainfuck (imjakingit.github.io)

Show HN: iOS Screen Time from a REST API (thescreentimenetwork.com)

Show HN: MuJS Running on TempleOS (git.checksum.fail)

Show HN: Autonomous AI Agent for e-signatures – chat/voice assistant (docendorse.com)

Show HN: Claude Composer (github.com)

Show HN: Container Use for Agents (github.com)

Show HN: Visualize nested list properties in the CodeLLDB debugger (github.com)

Show HN: SDFY: Fast 2D Vector Shapes and Shadows with Signed Distance Functions (github.com)

Show HN: ClickStack – Open-source Datadog alternative by ClickHouse and HyperDX (github.com)

Show HN: SelfDB – Ditch Supabase and Firebase Lock-In, Self-Host Simply (selfdb.io)

Show HN: FlowHawk – ultra fast eBPF network security monitor with ML (github.com)

Show HN: Javi: An AI-Boyfriend that understands you. It chats and sexts (chatwithjavi.com)

Show HN: I wrote a Java decompiler in pure C language (github.com)

Show HN: NanoTS – Fast, embeddable, tiny time series database (github.com)

Show HN: I made a 3D SVG Renderer that projects textures without rasterization (seve.blog)

Show HN: Nowa – No-signup event planning app for scheduling and bill splitting (nowa.one)

Show HN: Arxivlens – Save Hours Researching Scientific Literature on ArXiv (arxivlens.com)

Show HN: Controlling 3D models with voice and hand gestures (github.com)

Why do LLMs have emergent properties?

Comments (110)