The Timmy Trap

77 metadat 58 8/15/2025, 2:10:45 PM jenson.org ↗

Comments (58)

sobiolite · 1h ago

The article says that LLMs don't summarize, only shorten, because...

"A true summary, the kind a human makes, requires outside context and reference points. Shortening just reworks the information already in the text."

Then later says...

"LLMs operate in a similar way, trading what we would call intelligence for a vast memory of nearly everything humans have ever written. It’s nearly impossible to grasp how much context this gives them to play with"

So, they can't summarize, because they lack context... but they also have an almost ungraspably large amount of context?

jchw · 1h ago

I think the real takeaway is that LLMs are very good at tasks that closely resemble examples it has in its training. A lot of things written (code, movies/TV shows, etc.) are actually pretty repetitive and so you don't really need super intelligence to be able to summarize it and break it down, just good pattern matching. But, this can fall apart pretty wildly when you have something genuinely novel...

gus_massa · 47m ago

Humans too. If I were too creative writing the midterm, most of my students would fail and everyone would be very unhappy.

BobaFloutist · 9m ago

That's because midterms are specifically supposed to assess how well you learned the material presented (or at least directed to), not your overall ability to reason. If you teach a general reasoning class, getting creative with the midterm is one thing, but if you're teaching someone how to solve differential equations, they're learning to the very edge of their ability in a given amount of time, and you present them with an example outside of what's been described, it kind of makes sense that they can't just already solve it. I mean, that's kind of the whole premise of education, that you can't just present someone with something completely outside of their experience and expect them to derive from first principles how it works.

card_zero · 41m ago

That's exams, not humanity.

btown · 1h ago

It's an interesting philosophical question.

Imagine an oracle that could judge/decide, with human levels of intelligence, how relevant a given memory or piece of information is to any given situation, and that could verbosely describe which way it's relevant (spatially, conditionally, etc.).

Would such an oracle, sufficiently parallelized, be sufficient for AGI? If it could, then we could genuinely describe its output as "context," and phrase our problem as "there is still a gap in needed context, despite how much context there already is."

And an LLM that simply "shortens" that context could reach a level of AGI, because the context preparation is doing the heavy lifting.

The point I think the article is trying to make is that LLMs cannot add any information beyond the context they are given - they can only "shorten" that context.

If the lived experience necessary for human-level judgment could be encoded into that context, though... that would be an entirely different ball game.

tovej · 1h ago

You can reconcile these points by considering what specific context is necessary. The author specifies "outside" context, and I would agree. The human context that's necessary for useful summaries is a model of semantic or "actual" relationships between concepts, while the LLM context is a model of a single kind of fuzzy relationship between concepts.

In other words the LLM does not contain the knowledge of what the words represent.

ratelimitsteve · 1h ago

I think the differentiator here might not be the context it has, but the context it has the ability to use effectively in order to derive more information about a given request.

kayodelycaon · 1h ago

They can’t summarize something that hasn’t been summarized before.

timmg · 1h ago

About a year ago, I gave a film script to an LLM and asked for a summary. It was written by a friend and there was no chance it or its summary was in the training data.

It did a really good -- surprisingly good -- job. That incident has been a reference point for me. Even if it is anecdotal.

pc86 · 50m ago

I'm not as cynical as others about LLMs but it's extremely unlikely that script had multiple truly novel things in it. Broken down into sufficient small pieces it's very likely every story element was present multiple times in the LLM's training data.

Spivak · 44m ago

I'm not sure I understand the philosophical point being made here. The LLM has "watched" a lot of movies and so understands the important parts of the original script it's presented with. Are we not describing how human media literacy works?

BobaFloutist · 4m ago

The point is that if you made a point to write a completely novel script, with (content-wise, not semantically) 0 DNA in it from previous movie scripts, with an unambiguous but incoherent and unstructured plot, your average literate human would be able summarize what happened on the page, for all that they'd be annoyed and likely distressed by how unusual it was; but that an LLM would do a disproportionately bad job compared to how well they do at other things, which makes us reevaluate what they're actually doing and how they actually do it.

It feels like they've mastered language, but it's looking more and more like they've actually mastered canon. Which is still impressive, but very different.

pc86 · 35m ago

I'm not making a philosophical point. The earlier comment is "I updated a new script and it summarized it," I was simply saying the odds of that script actually being new is very slim. Even though obviously that script or summaries of it do not exist in their entirety in the training data, its individual elements almost certainly do. So it's not really a novel (pun unintended?) summarization.

naikrovek · 1h ago

they can, they just can't do it well. at no point does any LLM understand what it's doing.

kblissett · 1h ago

If you think they can't do this task well I encourage you to try feeding ChatGPT some long documents outside of its training cutoff and examining the results. I expect you'll be surprised!

kayodelycaon · 1h ago

It can produce something that looks like a summarization based on similarly matching texts.

Depending how unique the text is determines how accurate the summarization is likely to be.

nojs · 18m ago

Even stronger than our need to anthropomorphize seems to be our innate desire to believe our species is special, and that “real intelligence” couldn’t ever be replicated.

If you keep redefining real intelligence as the set of things machines can’t do, then it’s always going to be true.

vcarrico · 11m ago

I might be mixing the concepts of intelligence and conscience etc, but the human mind is more than language and data; it's also experience. LLMs have all the data and can express anything around that context, but will never experience anything, which is singular for each of us, and it's part of what makes what we call intelligence (?). So they will never replicate the human mind; they can just mimic it.

I heard from Miguel Nicolelis that language is a filter for the human mind, so you can never build a mind from language. I interpreted this like trying to build an orange from its juice.

hackyhacky · 3m ago

> LLMs have all the data and can express anything around that context,

On the contrary, all their training data is their "experience".

hackyhacky · 1h ago

> LLMs mimic intelligence, but they aren’t intelligent.

I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?

I see two possibilities:

1. We define "intelligence" as definitionally unique to humans. For example, maybe intelligence depends on the existence of a human soul, or specific to the physical structure of the human brain. In this case, a machine (perhaps an LLM) could achieve "quacks like a duck" behavioral equality to a human mind, and yet would still be excluded from the definition of "intelligent." This definition is therefore not useful if we're interested in the ability of the machine, which it seems to me we are. LLMs are often dismissed as not "intelligent" because they work by inferring output based on learned input, but that alone cannot be a distinguishing characteristic, because that's how humans work as well.

2. We define "intelligence" in a results-oriented way. This means there must be some specific test or behavioral standard that a machine must meet in order to become intelligent. This has been the default definition for a long time, but the goal posts have shifted. Nevertheless, if you're going to disparage LLMs by calling them unintelligent, you should be able to cite a specific results-oriented failure that distinguishes them from "intelligent" humans. Note that this argument cannot refer to the LLMs' implementation or learning model.

card_zero · 11m ago

It may be the case that the failures of the ability of the machine (2) are best expressed by reference to the shortcomings of its internal workings (1), and not by contrived tests.

hackyhacky · 4m ago

It might be the case, but if those shortcomings are not visible in the results of the machine (and therefore not interpretable by a test), why do its internal workings even matter?

dkdcio · 37m ago

> I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?

the ability for long-term planning and, more cogently, actually living in the real world where time passes

hackyhacky · 26m ago

> the ability for long-term planning and, more cogently, actually living in the real world where time passes

1. LLMs seem to be able to plan just fine.

2. LLMs clearly cannot be "actually living" but I fail to see how that's related to intelligence per se.

ArnavAgrawal03 · 18m ago

> They had known him for only 15 seconds, yet they still perceived the act of snapping him in half as violent.

This is right out of Community

Isamu · 23m ago

You can compare the current state of LLMs to the days of chess machines when they first approached grandmaster level play. The machine approach was very brute force, and there was a lot of work done to improve the sheer amount of look ahead that was required to complete at the grandmaster level.

As opposed to what grandmasters actually did, which was less look ahead and more pattern matching to strengthen the position.

Now LLMs successfully leverage pattern matching, but interestingly it is still a kind of brute force pattern matching, requiring the statistical absorption of all available texts, far more than a human absorbs in a lifetime.

This enables the LLM to interpolate an answer from the structure of the absorbed texts with reasonable statistical relevance. This is still not quite “what humans do” as it still requires brute force statistical analysis of vast amounts of text to achieve pretty good results. For example training on all available Python sources in github and elsewhere (curated to avoid bad examples) yields pretty good results, not how a human would do it, but statistically likely to be pertinent and correct.

umanwizard · 1h ago

The article claims (without any evidence, argument or reason) that LLMs are not intelligent, then simply refuses to define intelligence.

How do you know LLMs aren't intelligent, if you can't define what that means?

energy123 · 1h ago

It's strange seeing so many takes like this two weeks after LLMs won gold medals at IMO and IOI. The cognitive dissonance is going to be wild when it all comes to a head in two years.

aprilthird2021 · 1h ago

IBM Watson won Jeopardy years ago, was it intelligent?

perching_aix · 54m ago

> Rather than being given questions, contestants are instead given general knowledge clues in the form of answers and they must identify the person, place, thing, or idea that the clue describes, phrasing each response in the form of a question. [0]

Doesn't sound like a test of intelligence to me, so no.

[0] https://en.wikipedia.org/wiki/Jeopardy!

aprilthird2021 · 49m ago

Why? Computers also won chess years ago, but they're not intelligent either? Why is winning a math competition intelligent but a trivia competition or a chess competition not intelligent?

umanwizard · 35m ago

Math and chess are similar in the sense that for humans, both require creativity, logical problem solving, etc.

But they are not at all similar for computers. Chess has a constrained small set of rules and it is pretty straightforward to make a machine that beats humans by brute force computation. Pre-Leela chess programs were just tree search, a hardcoded evaluation function, and lots of pruning heuristics. So those programs are really approaching the game in a fundamentally different way from strong humans, who rely much more on intuition and pattern-recognition rather than calculation. It just turns out the computer approach is actually better than the human one. Sort of like how a car can move faster than a human even though cars don’t do anything much like walking.

Math is not analogous: there’s no obvious algorithm for discovering mathematical proofs or solving difficult problems that could be implemented in a classical, pre-Gen AI computer program.

perching_aix · 40m ago

I don't wish to join you in framing intelligence as a step function.

I think winning a Go or a chess competition does demonstrate intelligence. And winning a math competition does even more so.

I do not think a trivia competition like Jeopardy demonstrates intelligence much at all, however. Specifically because it reads like it's not about intelligence, but about knowledge: it tests for association and recall, not for performing complex logical transformations.

This isn't to say I consider these completely independent. Most smart people are both knowledgeable and intelligent. It's just that they are distinct dimensions in my opinion.

You wouldn't say something tastes bad because its texture feels weird in your mouth, would you?

im3w1l · 39m ago

None of these things are enough by itself. It's rather that they have now solved so many things that the sum total has (arguably) crossed the threshold.

As for solving math problems, that is an important part of recursive self improvement. If it can come up with better algorithms and turn them into code, that will translate into raising it's own intelligence.

krapp · 56m ago

Why do critics of LLM intelligence need to provide a definition when people who believe LLMs are intelligent only take it on faith, not having such a definition of their own?

hackyhacky · 21m ago

> Why do critics of LLM intelligence need to provide a definition when people who believe LLMs are intelligent only take it on faith, not having such a definition of their own?

Because advocates of LLMs don't use their alleged intelligence as a defense; but opponents of LLMs do use their alleged non-intelligence as an attack.

Really, whether or not the machine is "intelligent", by whatever definition, shouldn't matter. What matters is whether it is a useful tool.

xg15 · 21m ago

I feel this article should be paired with this other one [1] that was on the frontpage a few days ago.

My impression is, there is currently one tendency to "over-anthropomorphize" LLMs and treat them like conscious or even superhuman entities (encouraged by AI tech leaders and AGI/Singularity folks) and another to oversimplify them and view them as literal Markov chains that just got lots of training data.

Maybe those articles could help guarding against both extremes.

[1] https://www.verysane.ai/p/do-we-understand-how-neural-networ...

mattgreenrocks · 16m ago

Previously when someone called out the tendency to over-anthropomorphize LLMs, a lot of the answers amounted to, “but I like doing it, therefore we should!”

I’ll be the first to say one should pick their battles. But hearing that over and over from a crowd like this that can be quite pedantic is very telling.

kbaker · 32m ago

Seems like this is close to the Uncanny Valley effect.

LLM intelligence is in the spot where it is simultaneously genius-level but also just misses the mark a tiny bit, which really sticks out for those who have been around humans their whole lives.

I feel that, just like more modern CGI, this will slowly fade with certain techniques and you just won't notice it when talking to or interacting with AI.

Just like in his post during the whole Matrix discussion.

> "When I asked for examples, it suggested the Matrix and even gave me the “Summary” and “Shortening” text, which I then used here word for word. "

He switches in AI-written text and I bet you were reading along just the same until he pointed it out.

This is our future now I guess.

pbw · 1h ago

LLM's can shorten and maybe tend to if you just say "summarize this" but you can trivially ask them to do more. I asked for a summary of Jenson's post and then offer a reflection, GPT-5 said, "It's similar to the Plato’s Cave analogy: humans see shadows (the input text) and infer deeper reality (context, intent), while LLMs either just recite shadows (shorten) or imagine creatures behind them that aren’t there (hallucinate). The “hallucination” behavior is like adding “ghosts”—false constructs that feel real but aren’t grounded.

That ain't shortening because none of that was in his post.

pitpatagain · 40m ago

I can't decide how to read your last sentence.

That reflection seems totally off to me: fluent, and flavored with elements of the article, but also not really what the article is about and a pretty weird/tortured use of the elements of the allegory of the cave, like it doesn't seem anything like Plato's Cave to me. Ironically demonstrates the actual main gist of the article if you ask me.

But maybe you meant that you think that summary is good and not textually similar to that post so demonstrating something more sophisticated than "shortening".

pbw · 7m ago

Yes, GPT-5's response above was not shortening because there was nothing in the OP about Plato's Cave. I agree that Plato's cave analogy was confusing here. Here's a better one from GPT-5, which is deeply ironic:

A New Yorker book review often does the opposite of mere shortening. The reviewer:

* Places the book in a broader cultural, historical, or intellectual context.

* Brings in other works—sometimes reviewing two or three books together.

* Builds a thesis that connects them, so the review becomes a commentary on a whole idea-space, not just the book’s pages.

This is exactly the kind of externalized, integrative thinking Jenson says LLMs lack. The New Yorker style uses the book as a jumping-off point for an argument; an LLM “shortening” is more like reading only the blurbs and rephrasing them. In Jenson’s framing, a human summary—like a rich, multi-book New Yorker review—operates on multiple layers: it compresses, but also expands meaning by bringing in outside information and weaving a narrative. The LLM’s output is more like a stripped-down plot synopsis—it can sound polished, but it isn’t about anything beyond what’s already in the text.

stefanv · 1h ago

What if the problem is not that we overestimate LLMs, but that we overestimate intelligence? Or to express the same idea for a more philosophically inclined audience, what if the real mistake isn’t in overestimating LLMs, but in overestimating intelligence itself by imagining it as something more than a web of patterns learned from past experiences and echoed back into the world?

foobarian · 32m ago

The LLMs are like a Huffman codec except the context is infinite and lossy

AndrewKemendo · 40m ago

Who are you going to lodge your complaint to that the set of systems and machines that just took your job isn’t “intelligent?”

Humans seem to get wrapped around these concepts like intelligence consciousness etc. because they seem to be the only thing differentiating us from every other animal when in fact it’s all a mirage.

ChrisMarshallNY · 1h ago

That's a great article.

Scott Jenson is one of my favorite authors.

He's really big on integrating an understanding of basic human nature, into design.

codeulike · 1h ago

Well I, for one, can't beleive what that guy did to poor Timmy

beezle · 4m ago

When I saw the post title I immediately thought of Timmy from South Park lol

snozolli · 1h ago

Regarding Timmy, the Companion Cube from the game Portal is the greatest example of induced anthropomorphism that I've ever experienced. If you know, you know, and if you don't, you should really play the game, since it's brilliant.

generationP · 1h ago

The cube doesn't work, or at least it didn't for me. The goggly eyes really do make a difference.

ChrisMarshallNY · 1h ago

I'm on a Mac, and would love to see Portal 2 (at least) ported to M-chips.

I would love Portal 3, even more.

bitwize · 32m ago

That's a matter of informed anthropomorphism. A lot of people don't become attached to the Companion Cube, but are informed that their player character is so attached.

tovej · 1h ago

Good article, it's been told before but it bears repeating.

Also I got caught on this one kind of irrelevant point regarding the characterization of the Matrix: I would say Matrix is not just diguised as a story about escaping systems of control, it's quite clearly about oppressive systems in society, with specific reference to gender expression. Lilly Wachowski has explicitly stated that it was supposed to be an allegory for gender transition.

xg15 · 34m ago

It wasn't. Switch was intended to be genderfluid, but the Matrix itself, or "logging out" of it was apparently not meant as an allegory for transitioning (though she doesn't mind the interpretation) :

https://www.them.us/story/lilly-wachowski-work-in-progress-s...

kayodelycaon · 1h ago

The character Switch was supposed to have a different gender in the matrix vs real life. It’s really a shame that didn’t happen.

altruios · 1h ago

There is nothing more free than the freedom to be who you really are.

Going to rewatch the Matrix tonight.

naikrovek · 1h ago

I've mentioned this to colleagues at work before.

LLMs give a very strong appearance of intelligence, because humans are super receptive to information provided via our native language. We often have to deal with imperfect speakers and writers, and we must infer context and missing information on our own. We do this so well that we don't know we're doing it. LLMs have perfect grammar and we subtly feel that they are extremely smart because subconsciously we recognize that we don't have to think about anything that's said, it is all syntactically perfect.

So, LLMs sort of trick us into masking their true limitations and believing that they are truly thinking; there are even models that call themselves thinking models, but they don't think, they just predict what the user is going to complain about and say that to themselves as an additional, dynamic prompt on top of the one you actually enter.

LLMs are very good at fooling us into the idea that they know anything at all; they don't. And humans are very bad at being discriminate about the source of the information presented to them if it is presented in a friendly way. The combination of those things is what has resulted in the insanely huge AI hype cycle that we are currently living in the middle of. Nearly everyone is overreacting to what LLMs actually are, and the few of us that believe that we sort of see what's actually happening are ignored for being naysayers, buzz-kills, and luddites. Shunned for not drinking the Kool-Aid.

Open hardware desktop 3D printing is dead (josefprusa.com)

Do Things That Don't Scale (paulgraham.com)

The beauty of a text only webpage (albanbrooke.com)

Letting inmates run the asylum: Using AI to secure AI (mattsayar.com)

The Electric Fence Stopped Working Years Ago (soonly.com)

The Timmy Trap (jenson.org)

SC's proposed nuclear reboot: 'We're going to finish these reactors' (scdailygazette.com)

Vaultwarden commit introduces SSO using OpenID Connect (github.com)

I let LLMs write an Elixir NIF in C; it mostly worked (overbring.com)

ARM adds neural accelerators to GPUs (newsroom.arm.com)

Compiler Bug Causes Compiler Bug: How a 12-Year-Old G++ Bug Took Down Solidity (osec.io)

People do not believe that adding supply reduces housing prices (aeaweb.org)

Simulating and Visualising the Central Limit Theorem (blog.foletta.net)

Gemma 3 270M: Compact model for hyper-efficient AI (developers.googleblog.com)

Tesorio (YC S15) Is Hiring a Senior GenAI Engineer (100% Remote) (tesorio.com)

Blurry rendering of games on Mac (colincornaby.me)

Steam can't escape the fallout from its censorship controversy (polygon.com)

We rewrote the Ghostty GTK application (mitchellh.com)

Recto – A Truly 2D Language (masatohagiwara.net)

Court Records Reveal Sig Sauer Knew of Pistol Risks for Years (smokinggun.org)

Fairness is what the powerful 'can get away with' study shows (phys.org)

I used to know how to write in Japanese (aethermug.com)

I made a real-time C/C++/Rust build visualizer (danielchasehooper.com)

An interactive guide to sensor fusion with quaternions (quaternion.cafe)

Making reliable distributed systems in the presence of software errors (2003) [pdf] (erlang.org)

Citybound: City building game, microscopic models to vividly simulate organism (aeplay.org)

Show HN: OWhisper – Ollama for realtime speech-to-text (docs.hyprnote.com)

Airbrush art of the 80s (2015) (coolandcollected.com)

The new science of “emergent misalignment” (quantamagazine.org)

Show HN: I built a free alternative to Adobe Acrobat PDF viewer (github.com)

Steve Wozniak: Life to me was never about accomplishment, but about happiness (yro.slashdot.org)

Streaming services are driving viewers back to piracy (theguardian.com)

Submerged Roman bathhouse in Baiae may be part of Cicero's villa (archaeologymag.com)

Galileo's Telescopes: Seeing Is Believing (historytoday.com)

DINOv3 (github.com)

Why LLMs can't really build software (zed.dev)

Is chain-of-thought AI reasoning a mirage? (seangoedecke.com)

Org-social is a decentralized social network that runs on Org Mode (github.com)

With waters at 32C, Mediterranean tropicalization shifts into high gear (phys.org)

Show HN: PgHook – Docker image that streams PostgreSQL row changes to webhooks (github.com)

Libgit2 – The Git Linkable Library (github.com)

Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

Some users report their Firefox browser is scoffing CPU power (theregister.com)

AI-induced dehumanization (2024) (myscp.onlinelibrary.wiley.com)

Swiss vs. UK approach to major tranport projects (freewheeling.info)

Time to End Roundtripping by Big Pharma (cfr.org)

Architecting large software projects [video] (youtube.com)

ADHD drug treatment and risk of negative events and outcomes (bmj.com)

500 days of math (gmays.com)

Brian Eno: The feeling that things are inevitably going to get worse (2009) (theguardian.com)

The Timmy Trap

Comments (58)