Yet Another LLM Rant

35 sohkamyung 37 8/9/2025, 12:25:19 PM overengineer.dev ↗

Comments (37)

simonw · 22s ago

> This concludes all the testing for GPT5 I have to do. If a tool is able to actively mislead me this easy, which potentially results in me wasting significant amounts of time in trying to make something work that is guaranteed to never work, it’s a useless tool.

Yeah, except it isn't. You can get enormous value out of LLMs if you get over this weird science fiction requirement that they never make mistakes.

You have to put the effort in to learn how to use them with a skeptical eye. I've been getting value as a developer from LLMs since the GPT-3 era, and those models sucked.

hodgehog11 · 40m ago

I am sympathetic to the reasoning as to why LLMs should not be used to help some programmers right now. But I get a little frustrated seeing many of these kinds of posts that talk about fundamental limitations of LLMs vs humans on the grounds that it cannot "logically reason" like a human does. These are limitations in the current approach to training and objectives; internally, we have no clue what is going on.

> it’s “just a statistical model” that generates “language” based on a chain of “what is most likely to follow the previous phrase”

Humans are statistical models too in an appropriate sense. The question is whether we try to execute phrase by phrase or not, or whether it even matters what humans do in the long term.

> The only way ChatGPT will stop spreading that nonsense is if there is a significant mass of humans talking online about the lack of ZSTD support.

Or you can change the implicit bias in the model by being more clever with your training procedure. This is basic stats here, not everything is about data.

> They don’t know anything, they don’t think, they don’t learn, they don’t deduct. They generate real-looking text based on what is most likely based on the information it has been trained on.

This may be comforting to think, but it's just wrong. It would make my job so much easier if it were true. If you take the time to define "know", "think", and "deduct", you will find it difficult to argue current LLMs do not do these things. "Learn" is the exception here, and is a bit more complex, not only because of memory and bandwidth issues, but also because "understand" is difficult to define.

raincole · 12m ago

While the normal distribution meme is notoriously overused, I think it fits the scenario here.

LLMs know so much (when you just use ChatGPT for the first time like it's an Oracle machine) -> LLMs don't know anything (when you understand how machine learning works) -> LLMs know so much (when you actually think about what 'know' means)

libraryofbabel · 15m ago

Yeah. The empty “it’s just a statistical model” critique (or the dressed-up “stochastic parrots” version of it) is almost a sign at this point that the person using it formed their opinions about AI back when ChatGPT first came out, and hasn’t really bothered to engage with it much since then.

If in 2022 I’d tried to convince AI skeptics that in three years we might have tools on the level of Claude Code, I’m sure I’d have heard everyone say it would be impossible because “it’s just a statistical model.” But it turned out that there was a lot more potential in the architecture for encoding structured knowledge, complex reasoning, etc., despite that architecture being probabilistic. (Don’t bet against the Bitter Lesson.)

LLMs have a lot of problems, hallucination still being one of them. I’d be the first to advocate for a skeptical hype-free approach to deploying them in software engineering. But at this point we need careful informed engagement with where the models are at now rather than cherry-picked examples and rants.

vidarh · 2m ago

People repeating the "stochastic parrot" meme in all kinds of variations if anything appear to be more like stochastic parrots than the typical LLM is.

efilife · 5m ago

> it cannot "logically reason" like a human does

Reason? Maybe. But there's one limitation that we currently have no idea how to overcome; LLMs don't know how much they know. If they tell you they don't something it may be a lie. If they tell you they do, this may be a lie too. I, a human, certainly know what I know and what I don't and can recall from where I know the information

vidarh · 1m ago

I have never met a human who has a good grasp of what they know and don't know. They may have a better graps of it than an LLM, but humans are awfully bad at understanding the limits of our own knowledge, and will argue very strongly in favour of knowing more than we demonstrably do in all kinds of contexts.

bwfan123 · 12m ago

Humans build theories of how things work. llms dont. Theories are deterministic symbolic representation of the chaotic worlds of meaning . Take the turing machine for example as a theory of computation in general, euclidean geometry as a theory for space, and newtonian mechanics as a theory for motion.

A theory gives 100% correct predictions. Although the theory itself may not model the world accurately. Such feedback between the theory, and its application in the world causes iterations to the theory. From newtonian mechanics to relativity etc.

Long story short, the LLM is a long way away from any of this. And to be fair to LLMs, the average human is not creating theories, it takes some genius to create them (newton, turing, etc).

Understanding something == knowing the theory of it.

hodgehog11 · 5m ago

> Humans build theories of how things work. llms dont. Theories are deterministic symbolic representation of the chaotic worlds of meaning

What made you believe this is true? Like it or not, yes, they do (at least to the best extent of our definitions of what you've said). There is a big body of literature exploring this question, and the general consensus is that all performant deep learning models adopt an internal representation that can be extracted as a symbolic representation.

tptacek · 2m ago

LLMs can be a useful tool, maybe. But don’t anthropomorphize them.

(but, earlier)

If a tool is able to actively mislead me this easy, which potentially results in me wasting significant amounts of time in trying to make something work that is guaranteed to never work, it’s a useless tool. I don’t like collaborating with chronic liars.

bfioca · 42m ago

>...it’s a useless tool. I don’t like collaborating with chronic liars who aren’t able to openly point out knowledge gaps...

I think a more correct take here might be "it's a tool that I don't trust enough to use without checking," or at the very least, "it's a useless tool for my purposes." I understand your point, but I got a little caught up on the above line because it's very far out of alignment with my own experience using it to save enormous amounts of time.

lazide · 32m ago

It’s a tool that fundamentally can’t be used reliably without double checking everything it. That is rather different than you’re presenting it.

mhh__ · 10m ago

Checking is usually faster than writing from scratch so this is still +EV

efilife · 4m ago

What does +EV mean? I'm looking but can't find a definition

tmnvdb · 23m ago

So similar to wikipedia

simianwords · 21m ago

Similar to anything really. Can I really trust anything without verifying? Scientific journals?

No comments yet

Maro · 22m ago

I'm currently working as a hands-off VP, so I don't use LLMs for coding at work, only for emails and document editing. I do use it for my hobby weekend coding stuff, which usually involves writing short 100-1000 LOC Python toy programs for my own education/entertainment. My way-of-working is quite primitive, I have zero integrations, nothing agentic, I just copy/paste with ChatGPT.

For this use-case it's been very useful, it can usually generate close-to-complete solutions, as long as it's one of the major programming languages, and it's a reasonably standard problem. So in general I'm always surprised when people say that LLMs are completely useless for coding --- this is just not true, and I feel sorry for people who shut themselves off from a useful tool.

But even at this small scale, even the best (o3) models sometimes totally fail. Recently I started a series of posts on distributed algorithms [1], and when I was working on the post/code for the Byzantine Generals / Consensus algorithm, o3 --- to my honest surprise --- just totally failed. I tried about 10 different times (both from scratch and describing the incorrect behaviour of its code), also showing it the original Lamport paper, and it just couldn't get it right... even though the toy implementation is just ~100 LOC, the actual algorithm portion is maybe 25 LOC. My hypothesis is that there are very few implementations online, and additionally I find the descriptions of the algorithm a bit vague (interleaves message cascade and decision logic).

[1] https://bytepawn.com/tag/distributed.html

drakenot · 20m ago

ChatGPT 5> Without adding third-party dependencies, how can I compress a Data stream with zstd in Swift on an iPhone?

(It auto-routed to its thinking model, took 25s)

> Short answer: you can’t. iOS doesn’t ship a public zstd encoder/decoder. The system Compression and AppleArchive frameworks only expose lzfse, lz4, zlib/deflate, and lzma. So with only Apple SDKs, there’s no way to produce zstd-compressed bytes.

> > Two practical paths:

> 1. If zstd isn’t mandatory, use a built-in codec (lzfse or lz4 are fastest) via Compression’s streaming API. Example (pure Swift, no deps) that compresses an arbitrary byte stream: >

> <code example>

> 2. If zstd is mandatory, you must embed zstd yourself. That can mean:

> - Vendoring the official C sources (libzstd) directly into your project and calling them via a tiny Swift wrapper/bridging header

> - Statically linking a prebuilt libzstd.a you compile for iOS

djeastm · 12m ago

Here's what I got with ChatGPT GPT-5, it must not have thought about it because it was near instantaneous:

>On iOS, you can use Apple’s built-in Zstandard (zstd) compression API from the Compression framework — no third-party dependencies required.

>Here’s how you can compress a Data stream with zstd: >...

https://chatgpt.com/share/68976c8f-7ae0-8012-b7a8-58e016246d...

hodgehog11 · 16m ago

I think the useful takeaway here is that Top 1 operation is generally not a good idea, especially not for making judgements. This doesn't address the main points of the blog though.

gdsys · 39m ago

"Based on my research, zstd compression is not natively supported by iOS or Apple's frameworks, which means you cannot use zstd compression without adding some form of external code to your project"

Thanks Sonnet.

Full response:

https://www.perplexity.ai/search/without-adding-third-party-...

dcre · 33m ago

Most likely the key here is web search. Later I will try the post’s example with gpt-5 with search. I would be surprised if it didn’t say the same thing.

From a product point of view, it seems clear that just as they have work to get the model to dynamically decide to use reasoning when it would help, they have to do the same with web search.

quantum_state · 33m ago

An implication from the 1985 paper of Peter Naur on programming as theory building is that the current LLM coding tool would be very effective in generating technical debt even when it works ... use at your own risk.

simianwords · 22m ago

The prompt works for me and correctly identifies that zstd doesn't work https://chatgpt.com/share/689769c5-bd68-800b-ae63-c6a337dcfa...

"Short answer: you can’t. iOS doesn’t ship a Zstandard (zstd) encoder/decoder in any first-party framework. Apple’s built-in Compression framework supports LZFSE, LZ4, zlib/deflate, and LZMA—not zstd."

nikolayasdf123 · 37m ago

possible solution: "reality checks"

I see that GitHub Copilot actually runs code, writes simple exploratory programs, iteratively tests its hypothesis. it is astoundingly effective and fast.

same here. nothing stops this AI to actually trying to implement whatever this AI suggested, compile it, and see if this is actually works.

grounding in reality at inference time, so to speak.

jmkni · 46m ago

what's funny is that newer models will now be trained on the exact question, "Without adding third-party dependencies, how can I compress a Data stream with zstd in Swift on an iPhone?" and similar questions to it, because of this post

maybe the key to training future llm's is to write angry blog posts about the things they aren't good at and get them to the front page of hn?

nikolayasdf123 · 41m ago

good point. nobody knows you are a dog on internet anyways

nikolayasdf123 · 43m ago

> “Not having an answer” is not a possibility in this system - there’s always “a most likely response”, even if that makes no sense.

simple fix - probability cutoff. but in all seriousness this is something that will be fixed. don't see fundamental reason why not.

and I myself seen such hallucinations (about compression too actually) as well.

Seb-C · 28m ago

Hallucinations are not a bug or an exception, but a feature. Everything outputted by LLMs is 100% made-up, with a heavy bias towards what has been fed to it at first (human written content).

The fundamental reason why it cannot be fixed is because the model does not know anything about the reality, there is simply no such concept here.

To make a "probability cutoff" you first need a probability about what the reality/facts/truth is, and we have no such reliable and absolute data (and probably never will).

simianwords · 20m ago

>To make a "probability cutoff" you first need a probability about what the reality/facts/truth is, and we have no such reliable and absolute data (and probably never will).

Can a human give a probability estimate to their predictions?

nikolayasdf123 · 21m ago

have you seen Iris flowers dataset? it is fairly simple to find cutoffs to classify flowers.

or are you claiming in general that there is no objective truth in reality in philosophical sense? well, you can go on that more philosophical side of the road, or you can get more pragmatic. things just work, regardless how we talk about them.

cantor_S_drug · 41m ago

There was a writer who in order to get ideas to write about used to cut up words from newspaper headlines and then rearrange them.

In one rearrangement, he got "Son sues father for xyz". That headline came true 2 years later.

techpineapple · 3h ago

I wonder if one reasons new versions of GPT appear to get better - say at coding tasks is just because they have new knowledge.

When ChatGPT4 comes out, new versions of API’s will have less blog post / examples / documentation in their training data. So ChatGPT 5 comes out and seems to solve all the problems that ChatGPT4 had, but then of course fail on newer libraries. Rinse and repeat

its-kostya · 1h ago

> ... just because they have new knowledge.

This means there is a future where AI is training on data it self generated, and I worry that might not be sustainable.

jgalt212 · 48m ago

A software based Habsburg Jaw if you will.

lazide · 31m ago

This is already occurring, is not sustainable, and produces an effect known as Model Collapse.

techpineapple · 1h ago

I’ve heard of this idea of training on synthetic data, I wonder what is that data and does this increase or decrease hallucinations? Is the goal of training on synthetic data to better wear certain paths, or to increase the amount of knowledge / types of data.

Because the second seems vaguely impossible to do.

I solved my biggest founder problem (and it wasn't what I expected) (feedbackkit.app)

ASK HN: SleepLive – ASMR custom audio and video requests platform (sleeplive.io)

Empire of the Absurd: A Brief History of the Absurdities of the Soviet Union (laurivahtre.ee)

UK unveils plans to 'transform' the consumer smart meter experience (theregister.com)

So, About Those Big Trade Deals (theatlantic.com)

'Toothless' compulsory voting can increase voter turnout (phys.org)

All-In on Omarchy at 37signals (world.hey.com)

Memory 2.0: Attentive Memory (2012) (blog.ninlabs.com)

Canada approves national standard for age verification, estimation (biometricupdate.com)

It's a mess: Quantum Mechanics turns 100 years old (quantamagazine.org)

Ask HN: Do you not submit things because page is festooned w pop-ups/ads/etc.?

ChatGPT Agent – EU Launch (help.openai.com)

How California energy policy is holding back a game changing climate technology (sfchronicle.com)

Ask HN: What do you use for user management/IAM in your SaaS app?

Tell HN: ChatGPT 4o has been re-enabled

Irmin Retrospective (patrick.sirref.org)

The Ancient Art and Intimate Craft of Artificial Eyes (thereader.mitpress.mit.edu)

End-User Programmable AI (queue.acm.org)

VR with: 1,400 nits – 180° FOV – 90 pixels per degree (meta.com)

60% of medal of honor recipients are Irish or Irish-American (en.wikipedia.org)

Show HN: Grid-table – Emacs grid table with rich text, images, formulas (github.com)

Cartels may be able to target witnesses after major court hack (politico.com)

Public DNS malware filters to be tested in 2025 (techblog.nexxwave.eu)

Restaurant chains feel the pinch as US consumers tighten their belts (ft.com)

Oxide's $100M Series B (oxide-and-friends.transistor.fm)

Man-eating' screw worm turns hospital into horror show (telegraph.co.uk)

Toit: A modern high-level language designed specifically for microcontrollers (toitlang.org)

Expediting On-Device LLM Personalization via Explainable Model Selection (arxiv.org)

As electric bills rise, evidence mounts that data centers share blame (apnews.com)

Notes on a Smaller Rust (2019) (without.boats)

Opencode [video] (youtube.com)

What even is distributed systems (notes.eatonphil.com)

Smartwatches offer little insight into stress levels, researchers find (theguardian.com)

Flattery, Lobbyists and Business: Crypto's Richest Man Campaigns for a Pardon (nytimes.com)

Retiring and relocating? Take a holistic approach (apnews.com)

Show HN: Sparc3D AI – High‑Res 3D Generation Tool (sparc3dai.com)

Humans make better content cops than AI, but cost 40x more (theregister.com)

Free AI w/ dynamic disagreement engine, optimized for constructive conflict (dmwithme.com)

Diffusion Language Models Are Super Data Learners (jinjieni.notion.site)

ESP32 Bus Pirate 0.5 – A Hardware Hacking Tool That Speaks Every Protocol (github.com)

UN: Booming solar, wind and green energy hits global tipping point for low cost (news.mongabay.com)

Take: Process file lines with a logic-based language (github.com)

Avatarl: Training language models from scratch with pure reinforcement learning (tokenbender.com)

CaMeL-Powered Secure Agent Demo with ADK (github.com)

Google AI – Confidently and Hilariously Wrong (photo-pick.com)

Google Gemini struggles to write code, calls itself "a disgrace to my species" (arstechnica.com)

Simon Willison's Lethal Trifecta Talk at the Bay Area AI Security Meetup (simonwillison.net)

Performance Pitfalls in C# / .NET – List Contains (richardcocks.github.io)

The Kingdom of Books (atlasobscura.com)

The Incompleteness of Ethics (aeon.co)

Yet Another LLM Rant

Comments (37)