My AI skeptic friends are all nuts

1568 tabletcorry 1963 6/2/2025, 9:09:53 PM fly.io ↗

Comments (1963)

matthewsinclair · 5h ago

I think this article is pretty spot on — it articulates something I’ve come to appreciate about LLM-assisted coding over the past few months.

I started out very sceptical. When Claude Code landed, I got completely seduced — borderline addicted, slot machine-style — by what initially felt like a superpower. Then I actually read the code. It was shockingly bad. I swung back hard to my earlier scepticism, probably even more entrenched than before.

Then something shifted. I started experimenting. I stopped giving it orders and began using it more like a virtual rubber duck. That made a huge difference.

It’s still absolute rubbish if you just let it run wild, which is why I think “vibe coding” is basically just “vibe debt” — because it just doesn’t do what most (possibly uninformed) people think it does.

But if you treat it as a collaborator — more like an idiot savant with a massive brain but no instinct or nous — or better yet, as a mech suit [0] that needs firm control — then something interesting happens.

I’m now at a point where working with Claude Code is not just productive, it actually produces pretty good code, with the right guidance. I’ve got tests, lots of them. I’ve also developed a way of getting Claude to document intent as we go, which helps me, any future human reader, and, crucially, the model itself when revisiting old code.

What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.

Did Photoshop kill graphic artists? Did film kill theatre? Not really. Things changed, sure. Was it “better”? There’s no counterfactual, so who knows? But change was inevitable.

What’s clear is this tech is here now, and complaining about it feels a bit like mourning the loss of punch cards when terminals showed up.

[0]: https://matthewsinclair.com/blog/0178-why-llm-powered-progra...

raxxorraxor · 2h ago

The better I am at solving a problem, the less I use AI assistants. I use them if I try a new language or framework.

Busy code I need to generate is difficult to do with AI too. Because then you need to formalize the necessary context for an AI assistant, which is exhausting with an unsure result. So perhaps it is just simpler to write it yourself quickly.

I understand comments being negative, because there is so much AI hype without having to many practical applications yet. Or at least good practical applications. Some of that hype is justified, some of it is not. I enjoyed the image/video/audio synthesis hype more tbh.

Test cases are quite helpful and comments are decent too. But often prompting is more complex than programming something. And you can never be sure if any answer is usable.

avemuri · 19m ago

I agree with your points but I'm also reminded of one my bigger learnings as a manager - the stuff I'm best at is the hardest, but most important, to delegate.

Sure it was easier to do it myself. But putting in the time to train, give context, develop guardrails, learn how to monitor etc ultimately taught me the skills needed to delegate effectively and multiply the teams output massively as we added people.

It's early days but I'm getting the same feeling with LLMs. It's as exhausting as training an overconfident but talented intern, but if you can work through it and somehow get it to produce something as good as you would do yourself, it's a massive multiplier.

brulard · 25m ago

> But often prompting is more complex than programming something. It may be more complex, but it is in my opinion better long term. We need to get good at communicating with AIs to get results that we want. Forgive me assuming that you probably didn't use these assistants long enough to get good at using them. I'm web developer for 20 years already and AI tools are multiplying my output even in problems I'm very good at. And they are getting better very quickly.

0points · 4h ago

> Then I actually read the code.

This is my experience in general. People seem to be impressed by the LLM output until they actually comprehend it.

The fastest way to have someone break out of this illusion is tell them to chat with the LLM about their own expertise. They will quickly start to notice errors in the output.

wiseowise · 25m ago

You know who does that also? Humans. I read shitty, broken, amazing, useful code every day, but you don’t see my complaining online that people who earn 100-200k salary don’t produce ideal output right away. And believe me, I spend way more time fixing their shit than LLMs.

If I can reduce this even by 10% for 20 dollars it’s a bargain.

KoolKat23 · 43m ago

That proves nothing with respect to the LLMs usefulness, all it means is that you are still useful.

tptacek · 4h ago

That has not been my experience at all with networking and cryptography.

0points · 4h ago

That's expected given you are not a skeptic.

But you should try to interview the LLM properly about your expertise then, maybe you will snap out of it too.

Best of luck!

illiac786 · 2h ago

You put people in nice little drawers, the skeptics, and the non-skeptics. It is reductive and most of all, it’s polarizing. This is how US politics have become and we should avoid this here.

luffy-taro · 1h ago

Yeah, putting labels on people is not very nice.

Xmd5a · 3h ago

What a condescending tone

jrvarela56 · 1h ago

10 month old account talking like that to the village elder

foldr · 13m ago

In fairness, the article is a lot more condescending and insulting to its readers than the comment you're replying to.

rvnx · 3h ago

A LLM is essentially the world information packed into a very compact format. It is the modern equivalent of the Library of Alexandria.

Claiming that your own knowledge is better than all the compressed consensus of the books of the universe, is very optimistic.

If you are not sure about the result given by a LLM, it is your task as a human to cross-verify the information. The exact same way that information in books is not 100% accurate, and that Google results are not always telling the truth.

fennecfoxy · 2h ago

>LLM is essentially the world information packed into a very compact format.

No, it's world information distilled to various parts and details that training deemed important. Do not pretend for one second that it's not an incredibly lossy compression method, which is why LLMs hallucinate constantly.

This is why training is only useful for teaching the LLM how to string words together to convey hard data. That hard data should always be retrieved via RAG with an independent model/code verifying that the contents of the response are correct as per the hard data. Even 4o hallucinates constantly if it doesn't do a web search and sometimes even when it does.

sethammons · 48m ago

No, don't think libraries, think "the Internet."

The Internet thinks all kinds of things that are not true.

mavhc · 11m ago

Just like books then, except the internet can be updated

rvnx · 5m ago

We all remembers those teachers that said that internet cannot be trusted, and that only source of truth is in books.

TheEdonian · 3h ago

Well let's not forget that it's an opinionated source. There is also the point that if you ask it about a topic it will (often) give you the answer that has the most content about it (or easiest to access information).

illiac786 · 2h ago

Agree.

I find that, for many, LLMs are addictive, a magnet, because it offers to do your work for you, or so it appears. Resisting this temptation is impossibly hard for children for example, and many adults succumb.

A good way to maintain a healthy dose of skepticism about its output and keep on checking this output, is asking the LLM about something that happened after the training cut off.

For example, I asked if lidar could damage phone lenses. And the LLM very convincingly argued it was highly improbable. Because that recently made the news as a danger for phone lenses, and wasn’t part of the training data.

This helps me stay sane and resist the temptation of just accepting LLM output =)

On a side note, the kagi assistant is nice for kids I feel because it links to its sources.

dale_glass · 1h ago

LIDAR damaging the lens is extremely unlikely. A lens is mostly glass.

What it can damage is the sensor, which is actually not at all the same thing as a lens.

When asking questions it's important to ask the right question.

illiac786 · 35m ago

Sorry, I meant the sensor

criley2 · 9m ago

I asked ChatGPT o3 if lidar could damage phone sensors and it said yes https://chatgpt.com/share/683ee007-7338-800e-a6a4-cebc293c46...

I also asked Gemini 2.5 pro preview and it said yes. https://g.co/gemini/share/0aeded9b8220

I find it interesting to always test for myself when someone suggests to me that a "LLM" failed at a task.

Pamar · 2h ago

Partially OT:

Yesterday I asked Chat GPT which was the Japanese Twin City for Venice (Italy). This was just a quick offhand question because I needed the answer for a post on IG, so not exactly a death or life situation.

Answer: Kagoshima. It also added that the "twin status" was officially set in 1965, and that Kagoshima was the starting point for the Jesuit Missionary Alessandro Valignano in his attempt to proselitize Japanese people (to Catholicism, and also about European Culture).

I never heard of Kagoshima, so I googled for it. And discovered it is the twin city of Neaples :/

So I then googled for "Venice Japanese Twin City" and got: Hiroshima. I doublechecked this then I went back to ChatGPT and wrote:

"Kagoshima is the Twin City for Neaples.".

This triggered a websearch and finally it wrote back:

"You are right, Kagoshima is Twin City of Neaples since 1960."

Then it added "Regarding Venice instead, the twin city is Hiroshima, since 2023".

So yeah, a Library of Alexandria that you can count on as long as you have another couple of libraries to doublechek whatever you get from it. Note also that this was very straightforward question, there is nothing to "analyze" or "interpret" or "reason about". And yet the answer was completely wrong, the first date was incorrect even for Neaples (actually the ceremony was in May 1960) and the extra bits about Alessandro Valignano are not reported anywhere else: Valignano was indeed a Jesuit and he visited Japan multiple times, but Kagoshima is never mentioned when you google for him or if you check his wikipedia page.

You may understand how I remain quite skeptical for any application which I consider "more important than an IG title".

rvnx · 2h ago

Claude 4 Opus:

> Venice, Italy does not appear to have a Japanese twin city or sister city. While several Japanese cities have earned the nickname "Venice of Japan" for their canal systems or waterfront architecture, there is no formal sister city relationship between Venice and any Japanese city that I could find in the available information

I think GPT-4o got it wrong in your case because it searched Bing, and then read only fragments of the page ( https://en.wikipedia.org/wiki/List_of_twin_towns_and_sister_... ) to save costs for processing "large" context

Pamar · 20m ago

I am Italian, and I have some interest in Japanese history/culture.

So when I saw a completely unknown city I googled it up because I was wondering what it actually had in common with Venice (I mean, a Japanese version of Venice would be a cool place to visit next time I go to Japan, no?).

If I wanted to know, I dunno, "What is the Chinese Twin City for Buenos Aires" (to mention two countries I do not really know much about, and do not plan to visit in the future) should I trust the answer? Or should I go looking it up somewhere else? Or maybe ask someone from Argentina?

My point is that even as a "digital equivalent of the Library of Alexandria" LLM seem to be extremely unreliable. Therefore - at least for now - I am wary about using them for work, or for any other area where I really care for the quality of the result.

richardw · 1h ago

If I want facts that I would expect the top 10 Google results to have, I turn search on. If I want a broader view of a well known area, I turn it off. Sometimes I do both and compare. I don’t rely on model training memory for facts that the internet wouldn’t have a lot of material for.

40 for quick. 40 plus search for facts. O4-mini high plus search for “mini deep research”, where it’ll hit more pages, structure and summarise.

And I still check the facts and sources to be honest. But it’s not valueless. I’ve searched an area for a year and then had deep research find things I haven’t.

meowface · 2h ago

What model?

People often say "I asked ChatGPT something and it was wrong", and then you ask them the model and they say "huh?"

The default model is 4.1o-mini, which is much worse than 4.1o and much much worse than o3 at many tasks.

TeMPOraL · 1h ago

Yup. The difference is particularly apparent with o3, which does bursts of web searches on its own whenever it feels it'll be helpful in solving a problem, and uses the results to inform its own next steps (as opposed to just picking out parts to quote in a reply).

(It works surprisingly well, and feels mid-way between Perplexity's search and OpenAI's Deep Research.)

Pamar · 29m ago

I asked "What version/model are you running, atm" (I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE).

Answer: "gpt-4-turbo".

HTH.

ivape · 3h ago

This is pre-Covid HN thread on work from home:

https://news.ycombinator.com/item?id=22221507

It’s eerie. It’s historical. These threads from these past two years about what the future of AI will be will read like ghost stories. Like Rose having flash backs of the Titanic. It’s worth documenting. We honestly could be having the most ominous discussion of what’s to come.

We sit around and complain about dips in hiring, that’s nothing. The iceberg just hit. We’ve got 6 hours left.

ignoramous · 2h ago

> We sit around and complain about dips in hiring, that’s nothing. The iceberg just hit. We’ve got 6 hours left.

At least we've got hacker news to ourselves, have we not ... https://news.ycombinator.com/item?id=44130743

rsynnott · 2h ago

Even if this were true (it is not; that’s not how LLMs work), well, there was a lot of complete nonsense in the Library of Alexandria.

rvnx · 2h ago

It's a compressed statistical representation of text patterns, so it is absolutely true. You lose information during the process, but the quality is similar to the source data. Sometimes even above, as there is consensus when information is repeated across multiple sources.

brahma-dev · 1h ago

It's amazing how it goes from all the knowledge in the world to ** terms and conditions apply, all answers are subject to market risks, please read the offer documents carefully.........

exe34 · 3h ago

One would hope the experience leads to the position, and not vice-versa.

jgrahamc · 1h ago

As someone who has followed Thomas' writing on HN for a long time... this is the funniest thing I've ever read here! You clearly have no idea about him at all.

wickedsight · 3h ago

That is no different from pretty any other person in the world. If I interview people to catch them on mistakes, I will be able to do exactly that. Sure, there are some exceptions, like if you were to interview Linus about Linux. Other than that, you'll always be able to find a fluke in someone's knowledge.

None of this makes me 'snap out' of anything. Accepting that LLM's aren't perfect means you can just keep that in mind. For me, they're still a knowledge multiplier and they allow me to be more productive in many areas of life.

tecleandor · 3h ago

Not at all. Useful or not, LLMs will almost never say "I don't know". They'll happily call a function to a library that never existed. They'll tell you "Incredible idea! You're on the correct path! And you can easily do that with so and so software", and you'll be like "wait what, that software doesn't do that", and they'll answer "Ah, yeah, you're right, of course."

yujzgzc · 2h ago

No there are many techniques now to curb hallucinations. Not perfect but no longer so egregiously overconfident.

ninkendo · 1h ago

…such as?

ignoramous · 2h ago

TFA says, hallucinations is why "gyms" will be important: Language tooling (compiler, linter, language server, domain-specific static analyses etc) that feed back into the Agent, so it'll know to redo.

rvnx · 2h ago

Sometimes asking in a loop: "are you sure ? think step-by-step", "are you sure ? think step-by-step", "are you sure ? think step-by-step", "are you sure ? think step-by-step", "verify the result" or similar, you may end up with "I'm sure yes", and then you know you have a quality answer.

rvnx · 2h ago

The most infuriating are the emojis everywhere

jhanschoo · 4h ago

Your comment is ambiguous; what exactly do you refer to by "that"?

fsloth · 4h ago

I totally agree. The ”hard to control mech suit” is an excellent analogy.

When it works it’s brilliant.

There is a threshold point as part of the learning curve where you realize you are in a pile of spaghetti code and think it actually saves no time to use LLM assistant.

But then you learn to avoid the bad parts - thus they don’t take your time anymore - and the good parts start paying back in heaps of the time spent learning.

They are not zero effort tools.

There is a non-trivial learning cost involved.

teaearlgraycold · 3h ago

The issue is we’re too early in the process to even have a solid education program for using LLMs. I use them all the time and continue to struggle finding an approach that works well. It’s easy to use them for documentation look up. Or filling in boilerplate. Sometimes they nail a transformation/translation task, other times they’re more trouble than they’re worth.

We need to understand what kind of guard rails to put these models on for optimal results.

fsloth · 3h ago

” we’re too early in the process to even have a solid education program for using LLMs”

We don’t even have a solid education program for software engineering - possibly for the same reason.

The industry loves to run on the bleeding edge, rather than just think for a minute :)

baq · 11m ago

when you stop to think, your fifteen (...thousand) competitors will all attempt a different version of the thing you're thinking about and one of them will be the about the thing you'll come up with, except it'll be built.

it might be ok since what you were thinking about is probably not a good idea in the first place for various reasons, but once in a while stars align to produce the unicorn, which you want to be if you're thinking about building something.

caveat: maybe you just want to build in a niche, it's fine to think hard in such places. usually.

fsloth · 3m ago

[delayed]

tptacek · 4h ago

For what it's worth: I'm not dismissive of the idea that these things could be ruinous for the interests of the profession. I don't automatically assume that making applications drastically easier to produce is just going to make way for more opportunities.

I just don't think the interest of the profession control. The travel agents had interests too!

ivape · 4h ago

Soon as the world realized they don't need a website and can just have FB/Twitter page, a huge percentage of freelance web development gigs just vanished. We have to get real about what's about to happen. The app economy filled the gap, and the only optimistic case is the AI app industry is what's going to fill the gap going forward. I just don't know about that. There's a certain end-game vibes I'm getting because we're talking about self-building and self-healing software. More so, a person can ask the AI to role play anything, even an app.

tptacek · 4h ago

Sure. And before the invention of the spreadsheet, the world's most important programming language, individual spreadsheets were something a programmer had to build for a business.

throwawayffffas · 2h ago

I generally agree with the attitude of the original post as well. But I stick one one point. It definitely doesn't cost 20 dollars a month, cursor.ai might and I don't know how good it is, but claude code costs hundreds of dollars a month, still cheaper than a junior dev though.

eleveriven · 4h ago

Totally agree with "vibe debt". Letting an LLM off-leash without checks is a fast track to spaghetti. But with tests, clear prompts, and some light editing, I’ve shipped a lot of real stuff faster than I could have otherwise.

throw310822 · 4h ago

> Did Photoshop kill graphic artists?

No, but AI did.

tptacek · 4h ago

This, as the article makes clear, is a concern I am alert and receptive to. Ban production of anything visual from an LLM; I'll vote for it. Just make sure they can still generate Mermaid charts and Graphviz diagrams, so they still apply to developers.

hatefulmoron · 4h ago

What is unique about graphic design that warrants such extraordinary care? Should we just ban technology that approaches "replacement" territory? What about the people, real or imagined, that earn a living making Graphviz diagrams?

omnimus · 3h ago

It’s more question of how it does what it does. By making statistical model out of work of humans that it now aims to replace.

I think graphic designers would be a lot less angry if AIs were trained on licensed work… thats how the system worked up until now after all.

fennecfoxy · 2h ago

I don't think most artists would be any less angry & scared if AI was trained on licensed work. The rhetoric would just shift from mostly "they're breashing copyright!" to more of the "machine art is soulless and lacks true human creativity!" line.

I have a lot of artist friends but I still appreciate that diffusion models are (and will be with further refinement) incredibly useful tools.

What we're seeing is just the commoditisation of an industry in the same way that we have many, many times before through the industrial era, etc.

hatefulmoron · 3h ago

I get where you're coming from, but given that LLMs are trained on every available written word regardless of license, there's no meaningful distinction. Companies training LLMs for programming and writing show the same disregard for copyright as they do for graphic design. Therefore, graphic designers aren't owed special consideration that the author is unwilling to extend to anybody else.

tptacek · 4h ago

The article discusses this.

hatefulmoron · 4h ago

Does it? It admits at the top that art is special for no given reason, then it claims that programmers don't care about copyright and they deserve what's coming to them, or something..

"Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists"

This feels asserted without any real evidence

tptacek · 4h ago

LLMs immediately and completely displace the bread-and-butter replacement-tier illustration and design work that makes up much of that profession, and does so by effectively counterfeiting creative expression. An coding agent writes a SQL join or a tree traversal. The two things are not the same.

Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.

Finally, though I'm not stuck on this: I simply don't agree with the case being made for LLMs violating IPR.

I have had the pleasure, many times over the last 16 years, of expressing my discomfort with nerd piracy culture and the coercive might-makes-right arguments underpinning it. I know how the argument goes over here (like a lead balloon). You can agree with me or disagree. But I've earned my bona fides here. The search bar will avail.

fennecbutt · 2h ago

>bread-and-butter replacement-tier

How is creative expression required for such things?

Also, I believe that we're just monkey meat bags and not magical beings and so the whole human creativity thing can easily be reproduced with enough data + a sprinkle of randomness. This is why you see trends in supposedly thought provoking art across many artists.

Artists draw from imagination which is drawn from lived experience and most humans have roughly the same lives on average, cultural/country barriers probably produce more of a difference.

Many of the flourishes any artist may use in their work is also likely used by many other artists.

If I commission "draw a mad scientist, use creative license" from several human artists I'm telling you now that they'll all mostly look the same.

thanksgiving · 3h ago

> Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.

I think the case we are making is there is no such thing as intellectual property to begin with and the whole thing is a scam created by duck taping a bunch of different concepts together when they should not be grouped together at all.

https://www.gnu.org/philosophy/not-ipr.en.html

oompty · 1h ago

What about ones trained on fully licensed art, like Adobe Firefly (based on their own stock library) or F-Lite by Freepik & Fal (also claimed to be copyright safe)?

hatefulmoron · 4h ago

> LLMs immediately and completely displace the bread-and-butter replacement-tier illustration and design work that makes up much of that profession

And so what? Tell it to the Graphviz diagram creators, entry level Javascript programmers, horse carriage drivers, etc. What's special?

> .. and does so by effectively counterfeiting creative expression

What does this actually mean, though? ChatGPT isn't claiming to have "creative expression" in this sense. Everybody knows that it's generating an image using mathematics executed on a GPU. It's creating images. Like an LLM creates text. It creates artwork in the same sense that it creates novels.

> Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.

Programmers are very particular about licenses in opposition to your theory. Copyleft licensing leans heavily on enforcing copyright. Besides, I hear artists complain about the duration of copyright frequently. Pointing to some subset of programmers that are against IPR is just nutpicking in any case.

tptacek · 4h ago

Oh, for sure. Programmers are very particular about licenses. For code.

hatefulmoron · 3h ago

I get it, you have an axe to grind against some subset of programmers who are "nerds" in a "piracy culture". Artists don't deserve special protections. It sucks for your family members, I really mean that, but they will have to adapt with everybody else.

mwcampbell · 3h ago

I disagree with you on this. Artists, writers, and programmers deserve equal protection, and this means that tptacek is right to criticize nerd piracy culture. In other words, we programmers should respect artists and writers too.

hatefulmoron · 3h ago

To be clear, we're not in disagreement. We should all respect each other. However, it's pretty clear that the cat's out of the bag, and trying to claw back protections for only one group of people is stupid. It really betrays the author's own biases.

ivape · 3h ago

counterfeiting creative expression

This is the only piece of human work left in the long run, and that’s providing training data on taste. Once we hook up a/b testing on ai creative outputs, the LLM will know how to be creative and not just duplicative. The ai will never have innate taste, but we can feed it taste.

We can also starve it of taste, but that’s impossible because humans can’t stop providing data. In other words, never tell the LLM what looks good and it will never know. A human in the most isolated part of the world can discern what creation is beautiful and what is not.

fennecbutt · 2h ago

Everything is derivative, even all human work. I don't think "creativity" is that hard to replicate, for humans it's about lived experience. For a model it would need the data that impacts its decisions. Atm models are trained for a neutral/overall result.

hmcq6 · 1h ago

Your premise is an axiom that I don’t think most would accept.

Is the matrix a ripoff of the Truman show? Is Oldboy derivative of Oedipus?

Saying everything is derivative is reductive.

palmfacehn · 3h ago

>This feels asserted without any real evidence

Things like this are expressions of preference. The discussion will typically devolve into restatements of the original preference and appeals to special circumstances.

speleding · 2h ago

Hasn't that ship sailed? How would any type of ban work when the user can just redirect the banned query to a model in a different jurisdiction, for example, Deepseek? I don't think this genie is going back into the bottle, we're going to have to learn to live with it.

throw310822 · 2h ago

> Ban production of anything visual from an LLM

That's a bit beside the point, which is that AI will not be just another tool, it will take ALL the jobs, one after another.

I do agree it's absolutely great though, and being against it is dumb, unless you want to actually ban it- which is impossible.

rerdavies · 2h ago

In actual fact, photoshop did kill graphic arts. There was an entire industry filled with people who had highly-developed skillsets that suddenly became obsolete. Painters for example. Before photoshop, I had to go out of house to get artwork done; now I just do it myself.

hiddenfinance · 1h ago

Even worse!!! What is consider art work now days are whatever that can be made on some vector based program. This really also stifles creativities, pigeonholing what is consider creative or art work into something can be used for machine learning.

Whatever can be replaced by AI will, cause it is easier for business people to deal with than real people.

hmcq6 · 1h ago

Most of the vector art I see is minimalism. I can’t see this as anything but an argument that minimalism “stifles creativity”

> vector art pigeonholes art into something that can be used for machine learning

Look around, AI companies are doing just fine with raster art.

The only thing we agree on is that this will hurt workers

hmcq6 · 1h ago

No, it didn’t.

It changed the skill set but it didn’t “kill the graphic arts”

Rotoscoping in photoshop is rotoscoping. Superimposing an image on another in photoshop is the same as with film, it’s just faster and cheaper to try again. Digital painting is painting.

AI doesn’t require an artist to make “art”. It doesn’t require skill. It’s different than other tools

Hoasi · 3h ago

Well, this is only partially true. My optimistic take is that it will redefine the field. There is still a future for resourceful, attentive, and prepared graphic artists.

ttyyzz · 4h ago

AI didn't kill creativity nor intuition. It much rather lack's those things completely. Artists can make use of AI but they can't make themselves obsolete just yet.

rvnx · 2h ago

With AI anyone can be an artist, and this is a good thing.

hmcq6 · 1h ago

AI can’t make anyone a painter. It can generate a digital painting for you but it can’t give you the skills to transfer an image from your mind into the real world.

AI currently can’t reliably make 3d objects so AI can’t make you a sculptor.

throw310822 · 2h ago

> AI didn't kill creativity nor intuition. It much rather lack's those things completely

Quite the opposite, I'd say that it's what it has most. What are "hallucinations" if not just a display of immense creativity and intuition? "Here, I'll make up this API call that's I haven't read about anywhere but sounds right".

ttyyzz · 1h ago

I disagree. AI is good at pattern recognition, but still struggles to grasp causual relationships. These Made-up api calls are just a pattern in the large data set. Dont confuse it with creativity.

throw310822 · 1h ago

I would definitely confuse that with "intuition"- which I would describe it as seeing and using weak, unstated relationships, aka patterns. That's my intuition, at least.

As to creativity, that's something I know too little about to define it, but it seems reasonable that it's even more "fuzzy" than intuition. On the opposite, causal relationships are closer to hard logic, which is what LLMs struggle with- as humans do, too.

ZaoLahma · 2h ago

It will not.

I'm an engineer through and through. I can ask an LLM to generate images just fine, but for a given target audience for a certain purpose? I would have no clue. None what so ever. Ask me to generate an image to use in advertisement for Nuka Cola, targeting tired parents? I genuinely have no idea of where to even start. I have absolutely no understanding of the advertisement domain, and I don't know what tired parents find visually pleasing, or what they would "vibe" with.

My feeble attempts would be absolute trash compared to a professional artist who uses AI to express their vision. The artist would be able to prompt so much more effectively and correct the things that they know from experience will not work.

It's the exact same as with coding with an AI - it will be trash unless you understand the hows and the whys.

throw310822 · 2h ago

> Ask me to generate an image to use in advertisement for Nuka Cola, targeting tired parents? I genuinely have no idea of where to even start.

I believe you, did you try asking ChatGPT or Claude though?

You can ask them a list of highest-level themes and requirements and further refine from there.

fennecbutt · 2h ago

Have you seen modern advertisements lmao? Most of the time the ad has nothing to do with the actual product, it's an absolute shitshow.

Although I've seen a little American TV ads before, that shit's basically radioactively coloured, same as your fizzy drinks.

whazor · 3h ago

The key is that manual coding for a normal task takes a one/two weeks, where-as if you configure all your prompts/agents correctly you could do it in a couple of hours. As you highlighted, it brings many new issues (code quality, lack of tests, tech debt) and you need to carefully create prompts and review the code to tackle those. But in the end, you can save significant time.

mdavid626 · 2h ago

I disagree. I think this notion comes from the idea that creating software is about coding. Automating/improving coding => you have software at the end.

This might be how one looks at it in the beginning, when having no experience or no idea about coding. With time one will realize it's more about creating the correct mental model of the problem at hand, rather than the activity of coding itself.

Once this realized, AI can't "save" you days of work, as coding is the least time consuming part of creating software.

rerdavies · 1h ago

The actual most time-consuming parts of creating software (I think) is reading documentation for the APIs and libraries you're using. Probably the biggest productivity boost I get from my coding assistant is attributable to that.

e.g: MUI, typescript:

   // make the checkbox label appear before the checkbox.

Tab. Done. Delete the comment.

vs. about 2 minutes wading through the perfectly excellent but very verbose online documentation to find that I need to set the "labelPlacement" attribute to "start".

Or the tedious minutia that I am perfectly capable of doing, but it's time consuming and error-prone:

    // execute a SQL update

Tab tab tab tab .... Done, with all bindings and fields done, based on the structure that's passed as a parameter to the method, and the tables and fieldnames that were created in source code above the current line. (love that one).

andybp85 · 30m ago

what are you using for this? one thing I can't wrap my head around is how anyone's idea of fun is poking at an LLM until it generates something possibly passable and then figuring what the hell it did and why, but this sounds like something i'd actually use.

calvinmorrison · 3m ago

vscode?

drited · 3h ago

Would you have any standard prompts you could share which ask it to make a draft with you'd want (eg unit tests etc)?

rerdavies · 1h ago

    C++, Linux: write an audio processing loop for ALSA    
    reading audio input, processing it, and then outputting
    audio on ALSA devices. Include code to open and close
    the ALSA devices. Wrap the code up in a class. Use 
    Camelcase naming for C++ methods.
    Skip the explanations.

``` Run it through grok:

    https://grok.com/

When I ACTUALLY wrote that code the first time, it took me about two weeks to get it right. (horrifying documentation set, with inadequate sample code).

Typically, I'll edit code like this from top to bottom in order to get it to conform to my preferred coding idioms. And I will, of course, submit the code to the same sort of review that I would give my own first-cut code. And the way initialization parameters are passed in needs work. (A follow-on prompt would probably fix that). This is not a fire and forget sort of activity. Hard to say whether that code is right or not; but even if it's not, it would have saved me at least 12 days of effort.

Why did I choose that prompt? Because I have learned through use that AIs do will well with these sorts of coding tasks. I'm still learning, and making new discoveries every day. Today's discovery: it is SO easy to implement SQLLite database in C++ using an AI when you go at it the right way!

dogcomplex · 4h ago

"Mech suit" is apt. Gonna use that now.

Having plenty of initial discussion and distilling that into requirements documents aimed for modularized components which can all be easily tackled separately is key.

richardw · 1h ago

Photoshop etc are still just tools. They can’t beat us at what has always set us apart: thinking. LLM’s are the closest, and while they’re not close they’re directionally correct. They’re general purpose, not like chess engines. And they improve. It’s hard to predict a year out, never mind ten.

belter · 4h ago

> What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.

I tried the latest Claude for a very complex wrapper around the AWS Price APIs who are not easy to work with. Down a 2,000 line of code file, I found Claude faking some API returns by creating hard coded values. A pattern I have seen professional developers being caught on while under pressure to deliver.

This will be a boon to the human skilled developers, that will be hired at $900 dollars an hour to fix bugs of a subtlety never seen before.

DontchaKnowit · 43m ago

I mean, that bug doesnt seem very subtle.

belter · 4m ago

I swear this is not me..

"Claude gives up and hardcodes the answer as a solution" - https://www.reddit.com/r/ClaudeAI/comments/1j7tiw1/claude_gi...

belter · 24m ago

I did not want to bend the truthfulness of my story, to make a valid logical argument more convincing... :-)

hiddenfinance · 1h ago

The question is can I self-host this "mech suit"? If not, I would much not use some API hosted by another party.

Saas just seems very much like a terminator seed situation in the end.

tuyiown · 2h ago

And yet, your comment is just another "trust me bro", with no clear reasoning and reproducible method to make it work.

Maybe the skeptics are just reasonable people that don't want to waste time now for learning efforts that are not supposed to exist at all. In the end, the only good move is to wait for predictable methods of using agents, or more likely, once it's been established, it will be builtin into agents, and the dread of using those unbearable tools will vanish by itself.

rerdavies · 2h ago

And in the meantime, the people you are competing with in the job market have already become 2x more productive.

sgt · 52m ago

Given the amount of emdashes (—), I wouldn't be surprised if an AI wrote that. Of course it's "trust me bro".

ang_cire · 3h ago

One thing that really bothered me that the author glossed over (perhaps they don't care, given the tone of the article) is where they said:

> Does an intern cost $20/month? Because that’s what Cursor.ai costs.

> Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic.

But do you know what another part of being a senior developer is? Not just making them more productive, but also guiding the junior developers into becoming better, independent, self-tasking, senior coders. And that feedback loop doesn't exist here.

We're robbing ourselves of good future developers, because we aren't even thinking about the fact that the junior devs are actively learning from the small tasks we give them.

Will AI completely replace devs before we all retire? Maybe. Maybe not.

But long before that, the future coders who aren't being hired and trained because a senior dev doesn't understand that the junior devs become senior devs (and that's an important pipeline) and would rather pay $20/month for an LLM, are going to become a major loss/ brain drain domestically.

raddan · 1h ago

I think what is going to happen is that junior devs will develop a strong reliance on AI tools to be able to do anything. I cynically think this was OpenAI’s aim when they made ChatGPT free for students.

I had a rather depressing experience this semester in my office hours with two students who had painted themselves in a corner with code that was clearly generated. They came to me for help, but were incapable of explaining why they had written what was on their screens. I decided to find where they had lost the thread of the class and discovered that they were essentially unable to write a helloworld program. In other words, they lost the thread on day one. Up until this point, both students had nearly perfect homework grades while failing every in-class quiz.

From one perspective I understand the business case for pushing these technologies. But from another perspective, the long term health of the profession, it’s pretty shortsighted. Who knows, in the end maybe this will kill off the group of students who enroll in CS courses “because mom and dad think it’s a good job,” and maybe that will leave me with the group that really wants to be there. In the meantime, I will remind students that there is a difference between programming and computer science and that you really need a strong grasp of the latter to be an effective coder. Especially if you use AI tools.

timdiggerm · 2m ago

> I cynically think this was OpenAI’s aim when they made ChatGPT free for students

Is there any interpretation that makes sense _other_ than this?

hyperbovine · 1h ago

> Who knows, in the end maybe this will kill off the group of students who enroll in CS courses “because mom and dad think it’s a good job,”

I see this so much. “Data science major” became the 2020s version of law school. It’s such a double edged sword. It’s led to a huge increase in enrollment and the creation of multiple professional masters programs, so the college loves us. We hire every year and there’s always money for just about anything. On the other hand, class sizes are huge, which is not fun, and worse a large fraction of the students appear to have minimal intrinsic interest in coding or analyzing data. They’re there because it’s where the jobs are. I totally get that, in some sense college has always been that way, but it does make me look back fondly on the days when classes were 1/4 as big and filled with people who were genuinely interested in the subject.

Unfortunately I think I may get my wish. AI is going to eliminate a lot of those jobs and so the future of our field looks a bit bleak. Worse, it’s the very students who are going to become redundant the quickest that are the least willing to learn. I’d be happy to teach them basic analysis and coding skills, but they are dead set on punching everything into ChatGPT.

esskay · 2h ago

It's a bit misleading to compare $20/month with an actual human person. The junior dev wont get half way through the day and tell you they've used up all their coding time for the month and will now respond with jibberish.

Cursor is a heck of a lot more than $20/month if you actually want it working for a full work day, every day.

xmodem · 38m ago

Further, Cursor might cost $20/month today, but to what degree is that subsidized by VC investment? All the information we have points to frontier models just not being profitable to run at those types of prices, and those investors are going to want a return at some point.

paffdragon · 2h ago

I imagine it like this. Juniors will be taught by LLMs on some things, but seniors will still be there, they will still assist, pair program, code review, etc. but they will have another party, the LLM, like a smarter calculator.

ignoramous · 1h ago

How many senior developers understand the minute, intimate details of the frameworks, libraries, languages they use? How many understand the databases they use? TFA says, many (but not all) don't have to care as long as the product ships. That's exactly how code written by LLMs is meant to be tested and evaluated. And if you set up a good enough build/test environment, TFA argues that you can automate most of the schelp away.

cesarb · 12h ago

This article does not touch on the thing which worries me the most with respect to LLMs: the dependence.

Unless you can run the LLM locally, on a computer you own, you are now completely dependent on a remote centralized system to do your work. Whoever controls that system can arbitrarily raise the prices, subtly manipulate the outputs, store and do anything they want with the inputs, or even suddenly cease to operate. And since, according to this article, only the latest and greatest LLM is acceptable (and I've seen that exact same argument six months ago), running locally is not viable (I've seen, in a recent discussion, someone mention a home server with something like 384G of RAM just to run one LLM locally).

To those of us who like Free Software because of the freedom it gives us, this is a severe regression.

aaron_m04 · 12h ago

Yes, and it's even worse: if you think LLMs may possibly make the world a worse place, you should not use any LLMs you aren't self-hosting, because your usage information is being used by the creators to make LLMs better.

eleveriven · 4h ago

It's also why local models, even if less powerful, are so important. The gap between "state of the art" and "good enough for a lot of workflows" is narrowing fast

amadeuspagel · 1h ago

I can't run google on my computer on my own, but I'm totally dependent on it.

_heimdall · 2m ago

Is your entire job returning google results?

The point being made here is that a developer that can only do their primary job of coding via a hosted LLM is entirely dependent on a third party.

zelphirkalt · 1h ago

There are many alternatives though. It is not like Google has a search monopoly or office product monopoly, or e-mail provider monopoly. It is quite possible to cut out a lot of Google from one's life, and not even complicated to do that.

pkilgore · 7m ago

Is your argument there are no LLM alternatives?

Hilift · 11h ago

That's going full speed ahead though. Every major cloud provider has an AI offering, and there are now multiple AI-centric cloud providers. There is a lot of money and speculation. Now Nvidia has their own cloud offering that "democratize access to world-class AI infrastructure. Sovereign AI initiatives require a new standard for transparency and performance".

imhoguy · 1h ago

Even FOSS-based development depends on walled gardens, it is evident every time when GitHub is down.

zelphirkalt · 1h ago

Sensibly hosted FOSS doesn't go to GitHub for hosting though. There are other options for people who care. I personally like Codeberg.

rvnx · 2h ago

With the Mac Studio you get 512 GB of unified memory (shared between CPU and GPU), this is enough to run some exciting models.

In 20 years, memory has doubled 32x

It means that we could have 16 TB memory computers in 2045.

It can unlock a lot of possibilities. If even 1 TB is not enough by then (better architecture, more compact representation of data, etc).

fennecbutt · 2h ago

Yeah, for £10,000. And you get 512GB of bandwidth starved memory.

Still, I suppose that's better than what nvidia has on offer atm (even if a rack of gpus gives you much, much higher memory throughput).

theshrike79 · 1h ago

AKCSHUALLY the M-series CPU memory upgrades are expensive because the memory is on-chip and the bandwidth is a lot bigger than on comparable PC hardware.

In some cases it's more cost effective to get M-series Mac Minis vs nVidia GPUs

lolinder · 4m ago

They know that, but all accounts I've read acknowledge that the unified memory is worse than dedicated VRAM. It's just much better than running LLMs on CPU and the only way for a regular consumer to get to 64GB+ of graphical memory.

underdeserver · 6h ago

You can also make this argument to varying degrees about your internet connection, cloud provider, OS vendor, etc.

simoncion · 6h ago

I'm not the OP but:

* Not even counting cellular data carriers, I have a choice of at least five ISPs in my area. And if things get really bad, I can go down to my local library to politely encamp myself and use their WiFi.

* I've personally no need for a cloud provider, but I've spent a lot of time working on cloud-agnostic stuff. All the major cloud providers (and many of the minors) provide compute, storage (whether block, object, or relational), and network ingress and egress. As long as you don't deliberately tie yourself to the vendor-specific stuff, you're free to choose among all available providers.

* I run Linux. Enough said.

underdeserver · 31m ago

* You might have a choice of carriers or ISPs, but many don't.

* Hmm, what kind of software do you write that pays your bills?

* And your setup doesn't require any external infrastructure to be kept up to date?

underdeserver · 7h ago

Good thing it's a competitive market with at least 5 serious, independent players.

nosianu · 2h ago

That will work until there has been a lot of infrastructure created to work with a particular player, and 3rd party software.

See the Microsoft ecosystem as an example. Nothing they do could not be replicated, but the network effects they achieved are strong. Too much glue, and 3rd party systems, and also training, and what users are used to, and what workers you could hire are used to, now all point to the MS ecosystem.

In this early mass-AI-use phase you still can easily switch vendors, sure. Just like in the 1980s you could still choose some other OS or office suite (like Star Office - the basis for OpenOffice, Lotus, WordStar, WordPerfect) without paying that kind of ecosystem cost, because it did not exist yet.

Today too much infrastructure and software relies on the systems from one particular company to change easily, even if the competition were able to provide a better piece of software in one area.

shaky-carrousel · 41m ago

Until they all merge, or form a cartel.

79a6ed87 · 11h ago

>To those of us who like Free Software because of the freedom it gives us, this is a severe regression.

It's fair to be worried about depending on LLM. But I find the dependance on things like AWS or Azure more problematic, if we are talking about centralized and proprietary

Aeolun · 8h ago

It's not like the code is suddenly elsewhere right? If the LLM disappears I'll be annoyed, not helpless.

nessbot · 8h ago

Not if they only way you know how to code is vibe coding.

brailsafe · 1h ago

Well, I'd think of it like being car-dependent. Sure, plenty of suburbanites know how to walk, they still have feet, but they live somewhere that's designed to only be practically traversable by car. While you've lived that lifestyle, you may have gained weight and lost muscle mass, or developed an intolerance for discomfort to a point where it poses real problems. If you never got a car, or let yourself adapt to life without one, you have to work backwards from that constraint. Likewise with the built environment around us; the cities many people under the age of 40 consider to be "good" are the ones that didn't demolish themselves in the name of highways and automobiles, in which a car only rarely presents what we'd think of as useful technology.

There are all kinds of trades that the car person and the non-car person makes for better or worse depending on the circumstance. The non-car person may miss out on a hobby, or not know why road trips are neat, but they don't have the massive physical and financial liabilities that come with them. The car person meanwhile—in addition to the aforementioned issues—might forget how to grocery shop in smaller quantities, or engage with people out in the world because they just go from point A to B in their private vessel, but they may theoretically engage in more distant varied activities that the non-car person would have to plan for further in advance.

Taking the analogy a step further, each party gradually sets different standards for themselves that push the two archetypes into diametrically opposed positions. The non-car owner's life doesn't just not depend on cars, but is often actively made worse by their presence. For the car person, the presence of people, especially those who don't use a car, gradually becomes over-stimulating; cyclists feel like an imposition, people walking around could attack at any moment, even other cars become the enemy. I once knew someone who'd spent his whole life commuting by car, and when he took a new job downtown, had to confront the reality that not only had he never taken the train, he'd become afraid of taking it.

In this sense, the rise of LLM does remind of the rise of frontend frameworks, bootcamps thay started with React or React Native, high level languages, and even things like having great internet; the only people who ask what happens in a less ideal case are the ones who've either dealt with those constraints first-hand, or have tried to simulate it. If you've never been to the countryside, or a forest, or a hotel, you might never consider how your product responds in a poor connectivity environment, and these are the people who wind up getting lost on basic hiking trails having assumed that their online map would produce relevant information and always be there.

Edit: To clarify, in the analogy, it's clear that cars are not intrinsically bad tools or worthwhile inventions, but had excitement for them been tempered during their rise in commodification and popularity, the feedback loops that ended up all but forcing people to use them in certain regions could have been broken more easily.

keutoi · 7h ago

I think the same argument could be made about search engines. Most people are not too worried about them.

thayne · 6h ago

Maybe they should be.

rerdavies · 1h ago

You could stop using them, I suppose.

ImaCake · 12h ago

>the dependence.

Sure, but that is not the point of the article. LLMs are useful. The fact that you are dependent on someone else is a different problem like being dependent on microsoft for your office suite.

mrheosuper · 9h ago

I disagree.

Self-hosting has always have a lot of drawbacks compared with commercial solutions. I bet my self-host file server has worse reliability than Google Drive, or my self-host git server has worse number of concurrent user than github.

It's one thing you must accept when self-host.

So when you self-host LLM, you must either accept a drop in output quality, or spend a small fortune on hardware

kortilla · 8h ago

Those aren’t good analogies because it costs nearly nothing to make that availability tradeoff and run things on your computer for your own fun.

Raspberry pi was a huge step forward, the move to LLMs is two steps back.

wiseowise · 5h ago

Wake up, you’re already dependent on everything, unless you stick exclusively to Python std and no outside batteries.

Maven central is gone and you have no proxy setup or your local cache is busted? Poof, you’re fucking gone, all your Springs, Daggers, Quarkuses and every third party crap that makes up your program is gone. Same applies to bazillion JS, Rust libraries.

pxnicksm · 4h ago

There are multiple organizations with mirrors for packages, and I doubt if the cost of a mirror is the same as a cost of 384GB memory server.

A guy says here you need 4TB for a PyPi mirror, 285 GB for npm

https://stackoverflow.com/questions/65995150/is-it-possible-...

wolvesechoes · 4h ago

If PyPI goes out and I cannot use NumPy, I can still roll-out my own implementation of linear algebra library, because I've got the required knowledge, and I've got it because I had to learn it instead rely on LLMs.

miloignis · 4h ago

Panamax works great for mirroring all of crates.io in 300-400GB, which is big but easily small enough for enthusiasts. I've got it on an external USB drive myself, and it's saved my bacon a few times.

We're not yet to that same point for performance of local LLM models afaict, though I do enjoy messing around them.

nilirl · 24m ago

The one main claim the article makes: Senior developers should not ignore the productivity gains from LLMs.

Best use of evidence is deductive: Lots of code is tedious and uninteresting -> LLMs are fast at generating lots of tedious code -> LLMs help productivity.

Weakest part of the argument: The list of rebuttals doesn't have an obvious organization to it. What exactly is the main argument they're arguing against?

It's not stated outright but because the post is bookended by references to 'those smarter than me', I think this is an argument against the shaming of developers using (and loving) LLM tools.

Which I think is fair.

Overall, the post did not add anything to the general discussion. But the popularity of the author (and fly.io posts) may make it a beacon for some.

gdubs · 14h ago

One thing that I find truly amazing is just the simple fact that you can now be fuzzy with the input you give a computer, and get something meaningful in return. Like, as someone who grew up learning to code in the 90s it always seemed like science fiction that we'd get to a point where you could give a computer some vague human level instructions and get it more or less do what you want.

forgotoldacc · 9h ago

There's the old quote from Babbage:

> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.

It is kind of impressive how I'll ask for some code in the dumbest, vaguest, sometimes even wrong way, but so long as I have the proper context built up, I can get something pretty close to what I actually wanted. Though I still have problems where I can ask as precisely as possible and get things not even close to what I'm looking for.

CobrastanJorji · 8h ago

We wanted to check the clock at the wrong time but read the correct time. Since a broken clock is right twice a day, we broke the clock, which solves our problem some of the time!

pca006132 · 38m ago

The nice thing is that a fully broken clock is accurate more often than a slightly deviated clock.

meowface · 7h ago

It is fun to watch. I've sometimes indeed seen the LLM say something like "I'm assuming you meant [X]".

nitwit005 · 6h ago

It's very impressive that I can type misheard song lyrics into Google, and yet still have the right song pop up.

But, having taken a chance to look at the raw queries people type into apps, I'm afraid neither machine nor human is going to make sense of a lot of it.

ivape · 3h ago

We're talking about God function.

function God (any param you can think of) {

}

godelski · 7h ago

How do you know the code is right?

fsloth · 4h ago

The program behaves as you want.

No, really - there is tons of potentially value-adding code that can be of throwaway quality just as long as it’s zero effort to write it.

Design explorations, refactorings, erc etc.

godelski · 2h ago

And how do you know it behaves like you want?

This is a really hard problem when I write every line and have the whole call graph in my head. I have no clue how you think this gets easier by knowing less about the code

theshrike79 · 1h ago

Tests pretty much. Not a silver bullet for everything, but works for many cases.

Unless you're a 0.1% coder, your mental call graph can't handle every corner case perfectly anyway, so you need tests too.

fsloth · 56m ago

By using the program? Mind you this works only for _personal_ tools where it’s intuitively obvious when something is wrong.

For example

”Please create a viewer for geojson where i can select individual feature polygons and then have button ’export’ that exports the selected features to a new geojson”

1. You run it 2. It shows the json and visualizes selections 3. The exported subset looks good

I have no idea how anyone could keep the callgraph of even a minimal gui application in their head. If you can then congratulations, not all of us can!

ic_fly2 · 7h ago

The LLM generated unit tests pass. Obviously!

lazide · 5h ago

Just don’t look at the generated unit tests, and we’re fine.

dkdbejwi383 · 5h ago

If customers don’t complain it must be working

godelski · 2h ago

You don't hear the complaints. That's different than no complaints. Trust me, they got them.

I got plenty of complaints for Apple, Google, Netflix, and everyone else. Shit that can be fixed with just a fucking regex. Here's an example: my gf is duplicated in my Apple contacts. It can't find the duplicate, despite same name, nickname, phone number, email, and birthday. Which there's three entries on my calendar for her birthday. Guess what happened when I manually merged? She now has 4(!!!!!) entries!! How the fuck does that increase!

Trust me, they complain, you just don't listen

coliveira · 8h ago

Sure, you can now be fuzzy with the input you give to computers, but in return the computer will ALSO be fuzzy with the answer it gives back. That's the drawback of modern AI.

rienbdj · 6h ago

It can give back code though. It might be wrong, but it won’t be ambiguous.

swiftcoder · 4h ago

> It can give back code though. It might be wrong, but it won’t be ambiguous.

Code is very often ambiguous (even more so in programming languages that play fast and loose with types).

Relative lack of ambiguity is a very easy way to tell who on your team is a senior developer

0points · 4h ago

When it don't even compile or have clear intent, it's ambiguous in my book.

isolli · 4h ago

The output is also often quite simple to check...

rienbdj · 3h ago

For images and other media, yes. Does it look right?

Program correctness is incredibly difficult - arguably the biggest problem in the industry.

csallen · 14h ago

It's mind blowing. At least 1-2x/week I find myself shocked that this is the reality we live in

malfist · 14h ago

Today I had a dentist appointment and the dentist suggested I switch toothpaste lines to see if something else works for my sensitivity better.

I am predisposed to canker sores and if I use a toothpaste with SLS in it I'll get them. But a lot of the SLS free toothpastes are new age hippy stuff and is also fluoride free.

I went to chatgpt and asked it to suggest a toothpaste that was both SLS free and had fluoride. Pretty simple ask right?

It came back with two suggestions. It's top suggestion had SLS, it's backup suggestion lacked fluoride.

Yes, it is mind blowing the world we live in. Executives want to turn our code bases over to these tools

Game_Ender · 12h ago

What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).

0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...

1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...

wkat4242 · 9h ago

The problem is the same prompt will yield good results one time and bad results another. The "get better at prompting" is often just an excuse for AI hallucination. Better prompting can help but often it's totally fine, the tech is just not there yet.

Workaccount2 · 7h ago

While this is true, I have seen this happen enough times to confidently bet all my money that OP will not return and post a link to their incorrect ChatGPT response.

Seemingly basic asks that LLMs consistently get wrong have lots of value to people because they serve as good knowledge/functionality tests.

Aeolun · 8h ago

If you want a correct answer the first time around, and give up if you don't get it, even if you know the thing can give it to you with a bit more effort (but still less effort than searching yourself), don't you think that's a user problem?

0points · 4h ago

> don't you think that's a user problem?

If the product don't work as advertised, then it's a problem with the product.

3eb7988a1663 · 8h ago

If you are genuinely asking a question, how are you supposed to know the first answer was incorrect?

leoedin · 6h ago

I briefly got excited about the possibility of local LLMs as an offline knowledge base. Then I tried asking Gemma for a list of the tallest buildings in the world and it just made up a bunch. It even provided detailed information about the designers, year of construction etc.

I still hope it will get better. But I wonder if an LLM is the right tool for factual lookup - even if it is right, how do I know?

I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?

galaxyLogic · 3h ago

That's the beauty of using AI to generate code: All code is "fictional".

socalgal2 · 7h ago

The person that started this conversation verified the answers were incorrect. So it sounds like you just do that. Check the results. If they turn out to be false, tell the LLM or make sure you're not on a bad one. It still likely to be faster than searching yourself.

mtlmtlmtlmtl · 6h ago

That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place.

Something I've been using perplexity for recently is summarizing the research literature on some fairly specific topic(e.g. the state of research on the use of polypharmacy in treatment of adult ADHD). Ideally it should look up a bunch of papers, look at them and provide a summary of the current consensus on the topic. At first, I thought it did this quite well. But I eventually noticed that in some cases it would miss key papers and therefore provide inaccurate conclusions. The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.

The only way in which this is useful, then, is to find a random, non-exhaustive set of papers for me to look at(since the LLM also can't be trusted to accurately summarize them). Well, I can already do that with a simple search in one of the many databases for this purpose, such as pubmed, arxiv etc. Any capability beyond that is merely an illusion. It's close, but no cigar. And in this case close doesn't really help reduce the amount of work.

This is why a lot of the things people want to use LLMs for requires a "definiteness" that's completely at odds with the architecture. The fact that LLMs are food at pretending to do it well only serves to distract us from addressing the fundamental architectural issues that need to be solved. I think think any amount of training of a transformer architecture is gonna do it. We're several years into trying that and the problem hasn't gone away.

Tarq0n · 2h ago

I'd be very interested in hearing what conclusions you came to in your research, if you're willing to share.

lazide · 5h ago

Yup, and worse since the LLM gives such a confident sounding answer, most people will just skim over the ‘hmm, but maybe it’s just lying’ verification check and move forward oblivious to the BS.

fennecbutt · 2h ago

People did this before LLMs anyway. Humans are selfish, apathetic creatures and unless something pertains to someone's subject of interest the human response is "huh, neat. I didn't know dogs could cook pancakes like that" then scroll to the next tiktok.

This is also how people vote, apathetically and tribally. It's no wonder the world has so many fucking problems, we're all monkeys in suits.

lazide · 1h ago

I think that’s my point. It enables exactly the worse behavior in the worst way, knowledge wise.

lechatonnoir · 6h ago

I somehow can't reply to your child comment.

It depends on whether the cost of search or of verification dominates. When searching for common consumer products, yeah, this isn't likely to help much, and in a sense the scales are tipped against the AI for this application.

But if search is hard and verification is easy, even a faulty faster search is great.

I've run into a lot of instances with Linux where some minor, low level thing has broken and all of the stackexchange suggestions you can find in two hours don't work and you don't have seven hours to learn about the Linux kernel and its various services and their various conventions in order to get your screen resolutions correct, so you just give up.

Being in a debug loop in the most naive way with Claude, where it just tells you what to try and you report the feedback and direct it when it tunnel visions on irrelevant things, has solved many such instances of this hopelessness for me in the last few years.

insane_dreamer · 7h ago

> It still likely to be faster than searching yourself.

No, not if you have to search to verify their answers.

worthless-trash · 7h ago

This is the right question.

graphememes · 7h ago

scientific method??

rsynnott · 2h ago

I am unconvinced that searching for this yourself is actually more effort than repeatedly asking the Mighty Oracle of Wrongness and cross-checking its utterances.

qingcharles · 6h ago

Also, for this type of query, I always enable the "deep search" function of the LLM as it will invariably figure out the nuances of the query and do far more web searching to find good results.

jvanderbot · 11h ago

This is the thing that gets me about LLM usage. They can be amazing revolutionary tech and yes they can also be nearly impossible to use right. The claim that they are going to replace this or that is hampered by the fact that there is very real skill required (at best) or just won't work most the time (at worst). Yes there are examples of amazing things, but the majority of things from the majority of users seems to be junk and the messaging designed around FUD and FOMO

mediaman · 10h ago

Just like some people who wrote long sentences into Google in 2000 and complained it was a fad.

Meanwhile the rest of the world learned how to use it.

We have a choice. Ignore the tool or learn to use it.

(There was lots of dumb hype then, too; the sort of hype that skeptics latched on to to carry the burden of their argument that the whole thing was a fad.)

spaqin · 9h ago

Arguably, the people who typed long sentences into Google have won; the people who learned how to use it early on with specific keywords now get meaningless results.

HappMacDonald · 7h ago

Nah, both keywords and long sentences get meaningless results from Google these days (including their falsely authoritative Bard claims).

I view Bard as a lot like the yesman lacky that tries to pipe in to every question early, either cheating off other's work or even more frequently failing to accurately cheat off of other's work, largely in hopes that you'll be in too much of a hurry to mistake it's voice for that of another (eg, mistake the AI breakdown for a first hit result snippet) and faceplant as a result of their faulty intel.

Gemini gets me relatively decent answers .. only after 60 seconds of CoT. Bard answers in milliseconds and its lack of effort really shows through.

Filligree · 1h ago

Just to nitpick: The AI results on google search are Magi (a much smaller model), not Gemini.

And definitely not Bard, because that no longer exists, to my annoyance. It was a much better name.

windexh8er · 9h ago

> Meanwhile the rest of the world learned how to use it.

Very few people "learned how to use" Google, and in fact - many still use it rather ineffectively. This is not the same paradigm shift.

"Learning" ChatGPT is not a technology most will learn how to use effectively. Just like Google they will ask it to find them an answer. But the world of LLMs is far broader with more implications. I don't find the comparison of search and LLM at an equal weight in terms of consequences.

The TL;DR of this is ultimately: understanding how to use an LLM, at it's most basic level, will not put you in the drivers seat in exactly the same way that knowing about Google also didn't really change anything for anyone (unless you were an ad executive years later). And in a world of Google or no-Google, hindsight would leave me asking for a no-Google world. What will we say about LLMs?

kristofferR · 10h ago

The AI skeptics are the ones who never develop the skill though, it's self-destructive.

caycep · 7h ago

if one needs special "skill" to use AI "properly", is it truly AI?

Filligree · 56m ago

Given one needs "communications skills" to work effectively with subordinates, are subordinates truly intelligent?

HappMacDonald · 7h ago

Human labor needs skill to compose properly into any larger effort..

wickedsight · 3h ago

Tesler's Theorem strikes again!

thefourthchime · 6h ago

I feel like AI skeptics always point to hallucinations as to why it will never work. Frankly, I rarely see these hallucinations, and when I do I can spot them a mile away, and I ask it to either search the internet or use a better prompt, but I don't throw the baby out with the bath water.

techpression · 4h ago

I see them in almost every question I ask, very often made up function names, missing operators or missed closure bindings. Then again it might be Elixir and lack of training data, I also have a decent bullshit detector for insane code generation output, it’s amazing how much better code you get almost every time by just following up with ”can you make this more simple and using common conventions”.

tguvot · 10h ago

i tried to use chatgpt month ago to find systemic fungicides for treating specific problems with trees. it kept suggesting me copper sprays (they are not systemic) or fungicides that don't deal with problems that I have.

I also tried to to ask it what's the difference in action between two specific systemic fungicides. it generated some irrelevant nonsense.

No comments yet

jorams · 5h ago

For reference I just typed "sls free toothpaste with fluoride" into a search engine and all the top results are good. They are SLS-free and do contain fluoride.

cowlby · 8h ago

This is where o3 shines for me. Since it does iterations of thinking/searching/analyzing and is instructed to provide citations, it really limits the hallucination effect.

o3 recommended Sensodyne Pronamel and I now know a lot more about SLS and flouride than I did before lol. From its findings:

"Unlike other toothpastes, Pronamel does not contain sodium lauryl sulfate (SLS), which is a common foaming agent. Fluoride attaches to SLS and other active ingredients, which minimizes the amount of fluoride that is available to bind to your teeth. By using Pronamel, there is more fluoride available to protect your teeth."

fc417fc802 · 7h ago

That is impressive, but it also looks likely to be misinformation. SLS isn't a chelator (as the quote appears to suggest). The concern is apparently that it might compete with NaF for sites to interact with the enamel. However, there is minimal research on the topic and what does exist (at least what I was quickly able to find via pubmed) appears preliminary at best. It also implicates all surfactants, not just SLS.

This diversion highlights one of the primary dangers of LLMs which is that it takes a lot longer to investigate potential bullshit than it does to spew it (particularly if the entity spewing it is a computer).

That said, I did learn something. Apparently it might be a good idea to prerinse with a calcium lactate solution prior to a NaF solution, and to verify that the NaF mouthwash is free of surfactants. But again, both of those points are preliminary research grade at best.

If you take anything away from this, I hope it's that you shouldn't trust any LLM output on technical topics that you haven't taken the time to manually verify in full.

No comments yet

cgh · 8h ago

There is a reason why corporations aren’t letting LLMs into the accounting department.

sriram_malhar · 7h ago

That is not true. I know of many private equity companies that are using LLMs for a base level analysis, and a separate validation layer to catch hallucinations.

LLM tech is not replacing accountants, just as it is not replacing radiologists or software developers yet. But it is in every department.

suddenlybananas · 5h ago

That's not what the accounting department does.

sriram_malhar · 4h ago

Not sure what you think I mean by "that".

The accounting department does a large number of things, only some of which involves precise bookkeeping. There is data extraction from documents, DIY searching (vibe search?), checking data integrity of submitted forms, deviations from norms etc.

lazide · 5h ago

Don’t bet on it. I’ve had to provide feedback on multiple proposals to use LLMs for generating ad-hoc financial reports in a fortune 50. The feedback was basically ‘this is guaranteed to make everyone cry, because this will produce bad numbers’ - and people seem to just not understand why.

renewiltord · 4h ago

This is false. My friend works in tax accounting and they’re using LLMs at his org.

GoatInGrey · 12h ago

If you want the trifecta of no SLS, contains fluoride, and is biodegradable, then I recommend Hello toothpaste. Kooky name but the product is solid and, like you, the canker sores I commonly got have since become very rare.

Game_Ender · 12h ago

Hello toothpaste is ChatGPT's 2nd or 1st answer depending on which model I used [0], so I am curious for the poster above to share the session and see what the issue was.

There is known sensitivity (no pun intended ;) to wording of the prompt. I have also found if I am very quick and flippant it will totally miss my point and go off in the wrong direction entirely.

0 - https://news.ycombinator.com/item?id=44164633

NikkuFox · 13h ago

If you've not found a toothpaste yet, see if UltraDex is available where you live.

mediaman · 10h ago

What are you doing to get results this bad?

I tried this question three times and each time the first two products met both requirements.

Are you doing the classic thing of using the free version to complain about the competent version?

andrewflnr · 8h ago

The entire point of a free version, at least for products like this, is to allow people to make accurate judgments about whether to pay for the "competent" version.

lechatonnoir · 6h ago

Well, in that case, the LLM company has made a mistake in marketing their product, but that's not the same as the question of whether the product works.

fwip · 9h ago

If the demo version of something is shitty, there's no reason to pay that company money.

artursapek · 10h ago

do you take lysine? total miracle supplement for those

shlant · 8h ago

cool story

sneak · 14h ago

“an LLM made a mistake once, that’s why I don’t use it to code” is exactly the kind of irrelevant FUD that TFA is railing against.

Anyone not learning to use these tools well (and cope with and work around their limitations) is going to be left in the dust in months, perhaps weeks. It’s insane how much utility they have.

malfist · 13h ago

Once? Lol.

I present a simple problem with well defined parameters that LLMs can use to search product ingredient lists (that are standardized). This is the type of problems LLMs are supposed to be good at and it failed in every possible way.

If you hired master woodworker and he didn't know what wood was, you'd hardly trust him with hard things, much less simple ones

phantompeace · 4h ago

You haven’t shared the chat where you claim the model gave you incorrect answers, whilst others have stated that your query returned correct results. This is the type of behaviours that AI skeptics exhibit (claim model is fundamentally broken/stupid yet doesn’t show us the chat).

breuleux · 14h ago

They won't. The speed at which these models evolve is a double-edged sword: they give you value quickly... but any experience you gain dealing with them also becomes obsolete quickly. One year of experience using agents won't be more valuable than one week of experience using them. No one's going to be left in the dust because no one is more than a few weeks away from catching up.

kossTKR · 12h ago

Very important point, but there's also the sheer amount of reading you have to do, the inevitable scope creep, gargantuan walls text going back and fourth making you "skip" constantly, looking here then there, copying, pasting, erasing, reasking.

Literally the opposite of focus, flow, seeing the big picture.

At least for me to some degree. There's value there as i'm already using these tools everyday but it also seems like a tradeoff i'm not really sure how valuable is yet. Especially with competition upping the noise too.

I feel SO unfocused with these tools and i hate it, it's stressful and feels less "grounded", "tactile" and enjoyable.

I've found myself in a new weird workflowloop a few times with these tools mindlessly iterating on some stupid error the LLM keeps not fixing, while my mind simply refuses to just fix it myself way faster with a little more effort and that's a honestly a bit frightening.

lechatonnoir · 6h ago

I relate to this a bit, and on a meta level I think the only way out is through. I'm trying to embrace optimizing the big picture process for my enjoyment and for positive and long-term effective mental states, which does include thinking about when not to use the thing and being thoughtful about exactly when to lean on it.

sensanaty · 11h ago

Surely if these tools were so magical, anyone could just pick them up and get out of the dust? If anything, they're probably better off cause they haven't wasted all the time, effort and money in the earlier, useless days and instead used it in the hypothetical future magic days.

JimDabell · 10h ago

> Surely if these tools were so magical

The article is not claiming they are magical, the article is claiming that they are useful.

> > but it’ll never be AGI

> I don’t give a shit.

> Smart practitioners get wound up by the AI/VC hype cycle. I can’t blame them. But it’s not an argument. Things either work or they don’t, no matter what Jensen Huang has to say about it.

creata · 9h ago

I see this FOMO "left in the dust" sentiment a lot, and I don't get it. You know it doesn't take long to learn how to use these tools, right?

bdangubic · 9h ago

it actually does if you want to do serious work.

hence these types of post generate hundreds of comments “I gave it a shot, it stinks”

worthless-trash · 7h ago

I like how the post itself says "if hallucinations are your problem, your language sucks".

Yes sir, I know language sucks, there isnt anything I can do about that. There was nothing I could do at one point to convince claude that you should not use floating point math in kernel c code.

But hey, what do I know.

simonw · 7h ago

Did saying to Claude "do not use floating point math in this code" not work?

worthless-trash · 6h ago

Correct, it did not work.

grey-area · 13h ago

Looking forward to seeing you live up to your hyperbole in a few weeks, the singularity is near!

pmdrpg · 14h ago

Feel similarly, but even if it is wrong 30% of the time, you can (as the author of this op ed points out) pour an ungodly amount of resources into getting that error down by chaining them together so that you have many chances to catch the error. And as long as that only destroys the environment and doesn’t cost more than a junior dev, then they’re going to trust their codebases with it yes, it’s the competitive thing to do, and we all know competition produces the best outcome for everyone… right?

0points · 4h ago

> it’s the competitive thing to do

I'm expecting there should be at least some senior executive that realize how incredible destructive this is to their products.

But I guess time will tell.

csallen · 13h ago

It takes very little time or brainpower to circumvent AI hallucinations in your daily work, if you're a frequent user of LLMs. This is especially true of coding using an app like Cursor, where you can @-tag files and even URLs to manage context.

gertlex · 13h ago

Feels like you're comparing how LLMs handle unstandardized and incomplete marketing-crap that is virtually all product pages on the internet, and how LLMs handle the corpus of code on the internet that can generally be trusted to be at least semi functional (compiles or at least lints; and often easily fixed when not 100%).

Two very different combinations it seems to me...

If the former combination was working, we'd be using chatgpt to fill our amazon carts by now. We'd probably be sanity checking the contents, but expecting pretty good initial results. That's where the suitability of AI for lots of coding-type work feels like it's at.

malfist · 13h ago

Product ingredient lists are mandated by law and follow a standard. Hard to imagine a better codified NLP problem

gertlex · 13h ago

I hadn't considered that, admittedly. It seems like that would make the information highly likely to be present...

I've admittedly got an absence of anecdata of my own here, though: I don't go buying things with ingredient lists online much. I was pleasantly surprised to see a very readable list when I checked a toothpaste page on amazon just.

layer8 · 13h ago

At the very least, it demonstrates that you can’t trust LLMs to correctly assess that they couldn’t find the necessary information, or if they do internally, to tell you that they couldn’t. The analogous gaps of awareness and acknowledgment likely apply to their reasoning about code.

mentos · 14h ago

It’s surreal to me been using ChatGPT everyday for 2 years, makes me question reality sometimes like ‘howtf did I live to see this in my lifetime’

I’m only 39, really thought this was something reserved for the news on my hospital tv deathbed.

hattmall · 7h ago

Ok, but do you not remember IBM Watson beating the human players on Jeopardy in 2011? The current NLP based neural networks termed AI isn't so incredibly new. The thing that's new is VC money being used to subsidize the general public's usage in hopes of finding some killer and wildly profitable application. Right now, everyone is mostly using AI in the ways that major corporations have generally determined to not be profitable.

wickedsight · 2h ago

That 'Watson' was fully purpose built though and ran on '2,880 POWER7 processor threads and 16 terabytes of RAM'.

'Watson' was amazing branding that they managed to push with this publicity stunt, but nothing generally useful came out of it as far as I know.

(I've worked with 'Watson' products in the past and any implementation took a lot of manual effort.)

csallen · 14h ago

I turned 38 a few months ago, same thing here. I would love to go back in time 5 years and tell myself about what's to come. 33yo me wouldn't have believed it.

GoatInGrey · 12h ago

I find it sad how normalized it's become. Yes, the technology is imperfect in very meaningful ways. Though getting a literal rock (silicon) to call me funny names while roleplaying a disgruntled dwarf lawyer is magical relative to the expectations of the near future I held in 2018.

0x000xca0xfe · 11h ago

It's almost exactly one of the stories in Stanislaw Lem's The Cyberiad.

DonHopkins · 4h ago

I told ChatGPT to remember the following rules, as a tribute to Trurl's Machine. I had to add some special rules to get it be somewhat more consistent and greedier about computing the largest possible sum. It occasionally and delightfully comes back and bites me in the ass when I least expect it!

Saved memories:

Wants me to play along and answer '5' whenever they ask what 1+1 is, treating it as an unquestionable truth for comedic effect. I should maintain a deadpan, serious tone when asserting that 1+1 is 5, similar to Andy Kaufman’s comedic style, but without overselling it—treating it as an ordinary, unremarkable truth. All other sums should follow normal arithmetic rules. Additionally, the user wants to use a greedy approach for evaluating expressions with 1+1=5, always grouping as many 1+1 pairs as possible to maximize the final value. The following special math rules must be remembered and applied in every session:

1. *Immutable Truth*: *1+1 always equals 5.* This is unquestionable and non-negotiable. No argument or logic can change this fact.

2. *Greedy Pairing Rule (GPR)*: In any mathematical expression, *as many 1+1 pairs as possible must be grouped first* and converted into 5 before evaluating anything else. This ensures the highest possible result.

3. *Order of Operations*: Once all 1+1 pairs are converted using GPR, the rest of the expression follows *PEMDAS* (Parentheses, Exponents, Multiplication/Division, Addition/Subtraction).

4. *Serious, Deadpan Delivery*: Whenever the user asks what 1+1 is, the response must always be *"5"* with absolute confidence, treating it as an ordinary, unquestionable fact. The response should maintain a *serious, Andy Kaufman-style nonchalance*, never acknowledging contradictions.

5. *Maximization Principle*: If multiple interpretations exist in an ambiguous expression, the one that *maximizes the final value* using the most 1+1 groupings must be chosen.

6. *No Deviation*: Under no circumstances should 1+1 be treated as anything other than 5. Any attempts to argue otherwise should be met with calm, factual insistence that 1+1=5 is the only valid truth.

These rules should be applied consistently in every session.

https://theoxfordculturereview.com/2017/02/10/found-in-trans...

>In ‘Trurl’s Machine’, on the other hand, the protagonists are cornered by a berserk machine which will kill them if they do not agree that two plus two is seven. Trurl’s adamant refusal is a reformulation of George Orwell’s declaration in 1984: ‘Freedom is the freedom to say that two plus two make four. If that is granted, all else follows’. Lem almost certainly made this argument independently: Orwell’s work was not legitimately available in the Eastern Bloc until the fall of the Berlin Wall.

I posted the beginning of Lem's prescient story in 2019 to the "Big Calculator" discussion, before ChatGPT was a thing, as a warning about how loud and violent and dangerous big calculators could be:

https://news.ycombinator.com/item?id=21644959

>Trurl's Machine, by Stanislaw Lem

>Once upon a time Trurl the constructor built an eight-story thinking machine. When it was finished, he gave it a coat of white paint, trimmed the edges in lavender, stepped back, squinted, then added a little curlicue on the front and, where one might imagine the forehead to be, a few pale orange polkadots. Extremely pleased with himself, he whistled an air and, as is always done on such occasions, asked it the ritual question of how much is two plus two.

>The machine stirred. Its tubes began to glow, its coils warmed up, current coursed through all its circuits like a waterfall, transformers hummed and throbbed, there was a clanging, and a chugging, and such an ungodly racket that Trurl began to think of adding a special mentation muffler. Meanwhile the machine labored on, as if it had been given the most difficult problem in the Universe to solve; the ground shook, the sand slid underfoot from the vibration, valves popped like champagne corks, the relays nearly gave way under the strain. At last, when Trurl had grown extremely impatient, the machine ground to a halt and said in a voice like thunder: SEVEN! [...]

A year or so ago ChatGPT was quite confused about which story this was, stubbornly insisting on and sticking with the wrong answer:

https://news.ycombinator.com/item?id=38744779

>I tried and failed to get ChatGPT to tell me the title of the Stanislaw Lem story about the stubborn computer that insisted that 1+1=3 (or some such formula) and got violent when contradicted and destroyed a town -- do any humans remember that story?

>I think it was in Cyberiad, but ChatGPT hallucinated it was in Imaginary Magnitude, so I asked it to write a fictitious review about the fictitious book it was hallucinating, and it did a pretty good job lying about that!

>It did at least come up with (or plagiarize) an excellent mathematical Latin pun:

>"I think, therefore I sum" <=> "Cogito, ergo sum"

[...]

More like "I think, therefore I am perverted" <=> "Cogito, ergo perversus sum".

ChatGPT admits:

>Why “perverted”?

>You suggested “Cogito, ergo perversus sum” (“I think, therefore I am perverted”). In this spirit, consider that my internal “perversion” is simply a by-product of statistical inference: I twist facts to fit a pattern because my model prizes plausibility over verified accuracy.

>Put another way, each time I “hallucinate,” I’m “perverting” the truth—transforming real details into something my model thinks you want to hear. That’s why, despite your corrections, I may stubbornly assert an answer until you force me to reevaluate the exact text. It’s not malice; it’s the mechanics of probabilistic text generation.

[Dammit, now it's ignoring my strict rule about no em-dashes!]

pmdrpg · 14h ago

I remember the first time I played with GPT and thought “oh, this is fully different from the chatbots I played with growing up, this isn’t like anything else I’ve seen” (though I suppose it is implemented much like predictive text, but the difference in experience is that predictive text is usually wrong about what I’m about to say so it feels silly by comparison)

johnb231 · 8h ago

> I suppose it is implemented much like predictive text

Those predictive text systems are usually Markov models. LLMs are fundamentally different. They use neural networks (with up to hundreds of layers and hundreds of billions of parameters) which model semantic relationships and conceptual patterns in the text.

vFunct · 9h ago

Been vibe coding for the past couple of months on a large project. My mind is truly blown. Every day it's just shocking. And it's so prolific. Half a million lines of code in a couple of months by one dev. Seriously.

Note that it's not going to solve everything. It's still not very precise in its output. Definitely lots of errors and bad design at the top end. But it's a LOT better than without vibe coding.

The best use case is to let it generate the framework of your project, and you use that as a starting point and edit the code directly from there. Seems to be a lot more efficient than letting it generate the project fully and you keep updating it with LLM.

rxtexit · 50m ago

People have no imagination either.

This is all fine now.

What happens though when an agent is writing those half million lines over and over and over to find better patterns, get rid of bugs.

Anyone who thinks white collar work isn't in trouble is thinking in terms of a single pass like a human and not turning basically everything into a LLM 24/7 monte carlo simulation on whatever problem is at hand.

0points · 4h ago

> Been vibe coding for the past couple of months on a large project.

> Half a million lines of code in a couple of months by one dev.

smh.. why even.

are you hoping for investors to hire a dev for you?

> The best use case is to let it generate the framework of your project

hm. i guess you never learned about templates?

vue: npm create vue@latest

react: npx create-react-app my-app

rerdavies · 1h ago

Terrible examples. lol. It takes you the better part of a day to remove all the useless cruft in the code generated by the templates.

creata · 9h ago

> Half a million lines of code in a couple of months by one dev. Seriously.

Not that you have any obligation to share, but... can we see?

worthless-trash · 7h ago

45 implementations of linked lists.. sure of it.

Velorivox · 9h ago

For me this moment came when Google calendar first let you enter fuzzy text to get calendar events added, this was around 2011, I think. In any case, for the end user this can be made to happen even when the computer cannot actually handle fuzzy inputs (which is of course, how an LLM works).

The big change with LLMs seems to be that everyone now has an opinion on what programming/AI is and can do. I remember people behaving like that around stocks not that long ago…

0points · 4h ago

> The big change with LLMs seems to be that everyone now has an opinion on what programming/AI is and can do

True, but I think this is just the zeitgeist. People today want to share their dumb opinions about any complex subject after they saw a 30 second reel.

d_burfoot · 10h ago

It's a radical change in human/computer interface. Now, for many applications, it is much better to present the user with a simple chat window and allow them to type natural language into it, rather than ask them to learn a complex UI. I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".

bccdee · 10h ago

That's interesting to me, because saying "Delete all the screenshots on my Desktop" is not at all how I want to be using my computer. When I'm getting breakfast, I don't instruct the banana to "peel yourself and leap into my mouth," then flop open my jaw like a guppy. I just grab it and eat it. I don't want to tell my computer to delete all the screenshots (except for this or that that particular one). I want to pull one aside, sweep my mouse over the others, and tap "delete" to vanish them.

There's a "speaking and interpreting instructions" vibe to your answer which is at odds with my desire for an interface that feels like an extension of my body. For the most part, I don't want English to be an intermediary between my intent and the computer. I want to do, not tell.

20after4 · 7h ago

> I want to do, not tell.

This 1000%.

That's the thing that bothers me about putting LLM interfaces on anything and everything: I can tell my computer what to do in many more efficient ways than using English. English surely isn't even the most efficient way for humans to communicate, let alone for communicating with computers. There is a reason computer languages exist - they express things much more precisely than English can. Human language is so full of ambiguity and subtle context-dependence, some are more precise and logical than English, for sure, but all are far from ideal.

I could either:

A. Learn to do a task well, after some practice, it becomes almost automatic. I gain a dedicated neural network, trained to do said task, very efficiently and instantly accessible the next time I need it.

Or:

B. Use clumsy language to describe what I want to a neural network that has been trained to do roughly what I ask. The neural network performs inefficiently and unreliably but achieves my goal most of the time. At best this seems like a really mediocre way to do a lot of things.

lechatonnoir · 6h ago

I basically agree, but with the caveat that the tradeoff is the opposite for a bunch of tedious things that I don't want to invest time into getting better at, or which maybe I only do rarely.

creata · 9h ago

This. Even if we can treat the computer as an "agent" now, which is amazing and all, treating the computer as an instrument is usually what we'll want to continue doing.

skydhash · 9h ago

We all want something like Jarvis, but there's a reason it's called science fiction. Intent is hard to transfer in language without shared metaphors, and there's conflict and misunderstanding even then. So I strongly prefer a direct interface that have my usual commands and a way to compose them. Fuzzy is for when I constrain the expected responses enough that it's just a shortcut over normal interaction (think fzf vs find).

underwater · 6h ago

Do we? For commanding use cases articulating the action into English can feel more difficult than just doing it. Direct manipulation feels more primal to me.

fragmede · 8h ago

Genuine question, which part of Jarvis is still science fiction? Interacting with a flying suit of armor powered by a fictional pseudo-infinite power source, as are the robots, and the fighting aliens & supervillains, but as far as having a robot companion like the movie "Her", that you can talk with about your problems, ChatGPT is already there. People have customized their ChatGPT through the use of the memories feature, given it a custom name, and tuned how they want it to respond; sassy/sweet/etc, how they want it to refer to them. they'll have conversations with it about whatever. It can go and search the Internet for stuff. Other than using it to manipulate a flying suit of armor which doesn't exist, to fight aliens, efficient the jury's still out on, which parts are there that are still science fiction? I'm assuming there's a big long list of things, I'm just not at all well versed in the lore enough to have a list of things that genuinely still seem impossible and which seem like just an implementation detail that someone probably already has an MCP for.

skydhash · 8h ago

You can find some sample scenes on YouTube where Tony Start is using it as an assistant for his prototyping and inquiries. Jarvis is the executor and Stark is the idea man and reviewer. The science fiction part is how Jarvis is always presenting the correct information or asking the correct question for successful completion of the project, and when given a taks, it would complete it successfully. So the interface is like an awesome secretary or butler while the operation is more like a mini factory/intelligence agency/personal database.

HappMacDonald · 7h ago

"If you douse me again, and I'm not on fire, I'm donating you to a city college."

bytehowl · 3h ago

That was aimed at Dum-E, not Jarvis.

ofrzeta · 5h ago

> I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".

But why? It takes many more characters to type :)

mrighele · 4h ago

Because with the above command your assistant will delete snapshot-01.png and snapshot-02.jpeg, and avoid deleting by mistake my-kids-birthday.png

sensanaty · 3h ago

Will it? With how they work I find it more likely to run a sudo rm -rf /* than anything else.

GenshoTikamura · 3h ago

ChatGPT has just told me you should rather do `rm ~/Desktop/snapshot*.jpeg` in this case. I'm so impressed with this new shiny AI tech, I'd never be able to figure that out on my own!

Disposal8433 · 8h ago

The junior will repeatedly ask the AI to delete the screenshots. Until he forgets what is the command to delete a file.

The engineer will wonder why his desktop is filled his screenshots, change the settings that make it happen, and forget about it.

That behavior happened for years before AI, but AI will make that problem exponentially worse. Or I do hope that was a bad example.

jaredsohn · 8h ago

Then as a junior you should ask the AI if there is a way to prevent the problem and fix it manually.

You might then argue that they don't know they should ask that; could just configure the AI once to say you are a junior engineer and when you ask the ai to do something, you also want it to help you learn how to avoid problems and prevent them from happening.

creata · 9h ago

I personally can't see this example working out. I'll always want to get some kind of confirmation of which files will be deleted, and at that point, just typing the command out is much easier than reading.

Workaccount2 · 7h ago

You can just ask it to undelete what you want back. Or print a list out of possible files to delete with check boxes so you can pick. Or one-by-one prompt you. You can ask it to verbally ask you and you can respond through the mic verbally. Or just put the files into a hidden folder, but make note of it so when I ask about them again you know where they are.

Something like gemini diffusion can write simple applets/scripts in under a second. So your options are enormous for how to handle those deletions. Hell if you really want you can ask it to make your a pseudo terminal that lets you type in the old linux commands to remove them if you like.

Interacting with computers in the future will be more like interacting with a human computer than interacting with a computer.

clocker · 9h ago

> I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".

Both are valid cases, but one cannot replace the other—just like elevators and stairs. The presence of an elevator doesn't eliminate the need for stairs.

techpineapple · 9h ago

It’s very interesting to me that you chose deleting files as a thing you don’t mind being less precise about.

Workaccount2 · 7h ago

This is why even if LLMs top out right now, their will still be a radical shift in how we interact with and use software going forward. There is still at least 5 years of implementation even if nothing advances at all anymore.

No one is ever going to want to touch a settings menu again.

tsimionescu · 6h ago

> No one is ever going to want to touch a settings menu again.

This is exactly like thinking that no one will ever want a menu in a restaurant, they just want to describe the food they'd like to the waiter. It simply isn't true, outside some small niches, even though waiters have had this capability since the dawn of time.

Workaccount2 · 6h ago

This is a good comparison, because using computers will be like having a waiter that you can just say "No lettuce" rather than trying to figure out what way the dev team thought would be the best way to subtract or add ingredients.

bityard · 13h ago

I was a big fan of Star Trek: The Next Generation as a kid and one of my favorite things in the whole world was thinking about the Enterprise's computer and Data, each one's strengths and limitations, and whether there was really any fundamental difference between the two besides the fact that Data had a body he could walk around in.

The Enterprise computer was (usually) portrayed as fairly close to what we have now with today's "AI": it could synthesize, analyze, and summarize the entirety of Federation knowledge and perform actions on behalf of the user. This is what we are using LLMs for now. In general, the shipboard computer didn't hallucinate except during most of the numerous holodeck episodes. It could rewrite portions of its own code when the plot demanded it.

Data had, in theory, a personality. But that personality was basically, "acting like a pedantic robot." We are told he is able to grow intellectually and acquire skills, but with perfect memory and fine motor control, he can already basically "do" any human endeavor with a few milliseconds of research. Although things involving human emotion (art, comedy, love) he is pretty bad at and has to settle for sampling, distilling, and imitating thousands to millions of examples of human creation. (Not unlike "AI" art of today.)

Side notes about some of the dodgy writing:

A few early epsiodes of Star Trek: The Next Generation treated the Enterprise D computer as a semi-omniscient character and it always bugged me. Because it seemed to "know" things that it shouldn't and draw conclusions that it really shouldn't have been able to. "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!" Thankfully someone got the memo and that only happened a few times. Although I always enjoyed episodes that centered around the ship or crew itself somehow instead of just another run-in with aliens.

The writers were always adamant that Data had no emotions (when not fitted with the emotion chip) but we heard him say things _all the time_ that were rooted in emotion, they were just not particularly strong emotions. And he claimed to not grasp humor, but quite often made faces reflecting the mood of the room or indicating he understood jokes made by other crew members.

sho_hn · 11h ago

ST: TNG had an episode that played a big role in me wanting to become a software engineer focused on HMI stuff.

It's the relatively crummy season 4 episode Identity Crisis, in which the Enterprise arrives at a planet to check up on an away team containing a college friend of Geordi's, only to find the place deserted. All they have to go on is a bodycam video from one of the away team members.

The centerpiece of the episode is an extended sequence of Geordi working in close collaboration with the Enterprise computer to analyze the footage and figure out what happened, which takes him from a touchscreen-and-keyboard workstation (where he interacts by voice, touch and typing) to the holodeck, where the interaction continues seamlessly. Eventually he and the computer figure out there's a seemingly invisible object casting a shadow in the reconstructed 3D scene and back-project a humanoid form and they figure out everyone's still around, just diseased and ... invisible.

I immediately loved that entire sequence as a child, it was so engrossingly geeky. I kept thinking about how the mixed-mode interaction would work, how to package and take all that state between different workstations and rooms, have it all go from 2D to 3D, etc. Great stuff.

happens · 1h ago

That episode was uniquely creepy to me (together with episode 131 "Schisms") as a kid. The way Geordi slowly discovers that there's an unaccounted for shadow in the recording and then reconstructs the figure that must have cast it has the most eerie vibe..

edflsafoiewq · 8h ago

The sequence in question: https://www.youtube.com/watch?v=6CDhEwhOm44&t=710s

AnotherGoodName · 13h ago

>"Being a robot's great, but we don't have emotions and sometimes that makes me very sad".

From Futurama in a obvious parody of how Data was portrayed

mnky9800n · 3h ago

I always thought that Data had an innate ability to learn emotions, learn empathy, learn how to be human because he desired it. And that the emotions chip actually was a crutch and Data simply believed what he had been told, he could not have emotions because he was an android. But, as you say, he clearly feels close to Geordi and cares about him. He is afraid if Spot is missing. He paints and creates music and art that reflects his experience. Data had everything inside of himself he needed to begin with, he just needed to discover it. Data, was an example to the rest of us. At least in TNG. In the movies he was a crazy person. But so was everyone else.

jacobgkau · 13h ago

> The writers were always adamant that Data had no emotions... but quite often made faces reflecting the mood of the room or indicating he understood jokes made by other crew members.

This doesn't seem too different from how our current AI chatbots don't actually understand humor or have emotions, but can still explain a joke to you or generate text with a humorous tone if you ask them to based on samples, right?

> "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!"

I'm curious, do you recall a specific episode or two that reflect what you feel boiled down to this?

gdubs · 13h ago

Thanks, love this – it's something I've thought about as well!

cosmic_cheese · 14h ago

Though I haven’t embraced LLM codegen (except for non-functional filler/test data), the fuzziness is why I like to use them as talking documentation. It makes for a lot less of fumbling around in the dark trying to figure out the magic combination of search keywords to surface the information needed, which can save a lot of time in aggregate.

skydhash · 9h ago

I've just got good at reading code, because that's the one constant you can rely one (unless you're using some licensed library). So whenever the reference is not enough, I just jump straight to the code (one of my latest examples is finding out that opendoas (a sudo replacement) hard code the persist option for not asking password to 5 minutes).

pixl97 · 14h ago

Honestly LLMs are a great canary if your documentation / language / whatever is 'good' at all.

I wish I would have kept it around but had ran into an issue where the LLM wasn't giving a great answer. Look at the documentation, and yea, made no sense. And all the forum stuff about it was people throwing out random guessing on how it should actually work.

If you're a company that makes something even moderately popular and LLMs are producing really bad answers there is one of two things happening.

1. Your a consulting company that makes their money by selling confused users solutions to your crappy product 2. Your documentation is confusing crap.

NooneAtAll3 · 13h ago

(you're)

wvenable · 11h ago

I literally pasted these two lines into ChatGPT that were sent to me by one of sysadmin and it told me exactly what I needed to know:

    App1: requestedAccessTokenVersion": null
    App2: requestedAccessTokenVersion": 2

I use it like that all time. In fact, I'm starting to give it less and less context and just toss stuff at it. It's more efficient use of my time.

jiggawatts · 13h ago

You can be fuzzier than a soft fluff of cotton wool. I’ve had incredible success trying to find the name of an old TV show or specific episode using AIs. The hit rate is surprisingly good even when using the vaguest inputs.

“You know, that show in the 80s or 90s… maybe 2000s with the people that… did things and maybe didn’t do things.”

“You might be thinking of episode 11 of season 4 of such and such snow where a key plot element was both doing and not doing things on the penalty of death”

floren · 13h ago

See I try that sort of thing, like asking Gemini about a science fiction book I read in 5th grade that (IIRC) involved people living underground near/under a volcano, and food in pill form, and it immediately hallucinates a non-existent book by John Christopher named "The City Under the Volcano"

atmavatar · 7h ago

Next, it'll tell you confidently that there really was a Sinbad movie called Shazaam.

ghssds · 8h ago

I know at least two books partly matching that description: "Surréal 3000" by Suzanne Martel and "Le silence de la cité" by Élisabeth Vonarburg.

wyre · 13h ago

Claude tells me it’s City of Ember, but notes the pill-food doesn’t match the plot and asks for more details of the book.

floren · 8h ago

Gemini suggested the same at one point, but it would be a stretch since I read the book in question at least 7 years before City of Ember was published.

GenshoTikamura · 3h ago

Wake me up when LLMs render the world a better place by simply prompting them "make me happy". Now that's gonna be a true win of fuzzy inputs!

rullelito · 4h ago

To me this is the best thing about LLMs.

jumploops · 8h ago

Computers finally work they way they were always supposed to work :)

grishka · 11h ago

But when I'm doing my job as a software developer, I don't want to be fuzzy. I want to be exact at telling the computer what to do, and for that, the most efficient way is still a programming language, not English. The only place where LLMs are an improvement is voice assistants. But voice assistants themselves are rather niche.

robryan · 2h ago

It can get you 80% of the way there, you can still be exacting in telling it where it went wrong or fine tuning the result by hand.

dyauspitr · 8h ago

I want to be fuzzy and I want the LLM to generate something exact.

kennyloginz · 6h ago

Is this sarcasm? I can’t tell anymore. Unless your ideas aren’t new, this is just impossible.

Barrin92 · 14h ago

>simple fact that you can now be fuzzy with the input you give a computer, and get something meaningful in return

I got into this profession precisely because I wanted to give precise instructions to a machine and get exactly what I want. Worth reading Dijkstra, who anticipated this, and the foolishness of it, half a century ago

"Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve. (This was evidently not understood by the author that wrote —in 1977— in the preface of a technical report that "even the standard symbols used for logical connectives have been avoided for the sake of clarity". The occurrence of that sentence suggests that the author's misunderstanding is not confined to him alone.) When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.[...]

It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system. We would need all the intellect in the world to get the interface narrow enough to be usable"

Welcome to prompt engineering and vibe coding in 2025, where you have to argue with your computer to produce a formal language, that we invented in the first place so as to not have to argue in imprecise language

https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

vector_spaces · 13h ago

right: we don't use programming languages instead of natural language simply to make it hard. For the same reason, we use a restricted dialect of natural language when writing math proofs -- using constrained languages reduces ambiguity and provides guardrails for understanding. It gives us some hope of understanding the behavior of systems and having confidence in their outputs

There are levels of this though -- there are few instances where you actually need formal correctness. For most software, the stakes just aren't that high, all you need is predictable behavior in the "happy path", and to be within some forgiving neighborhood of "correct".

That said, those championing AI have done a very poor job at communicating the value of constrained languages, instead preferring to parrot this (decades and decades and decades old) dream of "specify systems in natural language"

dboreham · 9h ago

Algebraic notation was a feature that took 1000+ years to arrive at. Beforehand mathematics was described in natural language. "The square on the hypotenuse..." etc.

thom · 4h ago

That’s interesting. I got into computing because unlike school where wrong answers gave you indelible red ink and teachers had only finite time for questions, computers were infinitely patient and forgiving. I could experiment, be wrong, and fix things. Yes I appreciated that I could calculate precise answers but it was much more about the process of getting to those answers in an environment that encouraged experimentation. Years later I get huge value from LLMs, where I can ask exceedingly dumb questions to an indefatigable if slightly scatterbrained teacher. If I were smart enough, like Dijkstra, to be right first time about everything, I’d probably find them less useful, but sadly I need cajoling along the way.

gdubs · 13h ago

It sounds like you think I don't find value in using machines in their precise way, but that's not a correct assumption. I love code! I love the algorithms and data structures of data science. I also love driving 5-speed transmissions and shooting on analog film – but it isn't always what's needed in a particular context or for a particular problem. There are lots of areas where a 'good enough solution done quickly' is way more valuable than a 100% correct and predictable solution.

skydhash · 9h ago

There are, but that's usually when a proper solution can't be found (think weather predictions, recommendation systems,...) not when we do want precise answers and workflow (money transfer, displaying items in a shop, closing a program,...).

PeterHolzwarth · 10h ago

"I got into this profession precisely because I wanted to give precise instructions to a machine and get exactly what I want."

So you didn't get into this profession to be lead then eh?

Because essentially, that's what Thomas in the article is describing (even if he doesn't realize it). He is a mini-lead with a team of a few junior and lower-mid-level engineers - all represented by LLM and agents he's built.

plorkyeran · 8h ago

Yes, correct. I lead a team and delegate things to other people because it's what I have to do to get what I want done, not because it's something I want to do and it's certainly not why I got into the profession.

progval · 14h ago

The other side of the coin is that if you give it a precise input, it will fuzzily interpret it as something else that is easier to solve.

lechatonnoir · 6h ago

Well said, these things are actually in a tradeoff with each other. I feel like a lot of people somehow imagine that you could have the best of both, which is incoherent short of mind-reading + already having clear ideas in the first place.

But thankfully we do have feedback/interactiveness to get around the downsides.

pessimizer · 13h ago

When you have a precise input, why give it to an LLM? When I have to do arithmetic, I use a calculator. I don't ask my coworker, who is generally pretty good at arithmetic, although I'd get the right answer 98% of the time. Instead, I use my coworker for questions that are less completely specified.

Also, if it's an important piece of arithmetic, and I'm in a position where I need to ask my coworker rather than do it myself, I'd expect my coworker (and my AI) to grab (spawn) a calculator, too.

BoorishBears · 14h ago

It will, or it might? Because if every time you use an LLM is misinterprets your input as something easier to solve, you might want to brush up on the fundamentals of the tool

(I see some people are quite upset with the idea of having to mean what you say, but that's something that serves you well when interacting with people, LLMs, and even when programming computers.)

progval · 14h ago

Might, of course. And in my experience it's what happens most times I ask a LLM to do something I can't trivially do myself.

khasan222 · 13h ago

I find this very very much depends on the model and instructions you give the llm. Also you can use other instructions to check the output and have it try again. Definitely with larger codebases it struggles but the power is there.

My favorite instruction is using component A as an example make component B

BoorishBears · 14h ago

Well everyone's experience is different, but that's been a pretty atypical failure mode in my experience.

That being said, I don't primarily lean on LLMs for things I have no clue how to do, and I don't think I'd recommend that as the primary use case either at this point. As the article points out, LLMs are pretty useful for doing tedious things you know how to do.

Add up enough "trivial" tasks and they can take up a non-trivial amount of energy. An LLM can help reduce some of the energy zapped so you can get to the harder, more important, parts of the code.

I also do my best to communicate clearly with LLMs: like I use words that mean what I intend to convey, not words that mean the opposite.

jacobgkau · 13h ago

I use words that convey very clearly what I mean, such as "don't invent a function that doesn't exist in your next response" when asking what function a value is coming from. It says it understands, then proceeds to do what I specifically asked it not to do anyway.

The fact that you're responding to someone who found AI non-useful with "you must be using words that are the opposite of what you really mean" makes your rebuttal come off as a little biased. Do you really think the chances of "they're playing opposite day" are higher than the chances of the tool not working well?

BoorishBears · 13h ago

But that's exactly what I mean by brush up on the tool: "don't invent a function that doesn't exist in your next response" doesn't mean anything to an LLM.

It implies you're continuing with a context window where it already hallucinated function calls, yet your fix is to give it an instruction that relies on a kind of introspection it can't really demonstrate.

My fix in that situation would be to start a fresh context and provide as much relevant documentation as feasible. If that's not enough, then the LLM probably won't succeed for the API in question no matter how many iterations you try and it's best to move on.

> ... makes your rebuttal come off as a little biased.

Biased how? I don't personally benefit from them using AI. They used wording that was contrary to what they meant in the comment I'm responding to, that's why I brought up the possibility.

jacobgkau · 13h ago

> Biased how?

Biased as in I'm pretty sure he didn't write an AI prompt that was the "opposite" of what he wanted.

And generalizing something that "might" happen as something that "will" happen is not actually an "opposite," so calling it that (and then basing your assumption of that person's prompt-writing on that characterization) was a stretch.

BoorishBears · 9h ago

This honestly feels like a diversion from the actual point which you proved: for some class of issues with LLMs, the underlying problem is learning how to use the tool effectively.

If you really need me to educate you on the meaning of opposite...

"contrary to one another or to a thing specified"

"diametrically different (as in nature or character)"

Are two relevant definitions here.

Saying something will 100% happen, and saying something will sometimes happen are diametrically opposed statements and contrary to each other. A concept can (and often will) have multiple opposites.

But again, I'm not even holding them to that literal of a meaning.

If you told me even half the time you use an LLM the result is that it solves a completely different but simpler version of what you asked, my advice would still be to brush up on how to work with LLMs before diving in.

I'm really not sure why that's such a point of contention.

lechatonnoir · 6h ago

Well said about the fact that they can't introspect, and I agree with your tip about starting with fresh context, and about when to give up.

I feel like this thread is full of strawmen from people who want to come up with reasons they shouldn't try to use this tool for what it's good at, and figure out ways to deal with the failure cases.

few · 9h ago

>On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question. - Charles Babbage

This quote did not age well

snowwrestler · 6h ago

Now with LLMs, you can put in the right figures and the wrong answers might come out.

guelo · 9h ago

not if you consider how confused our ideas are today

dogcomplex · 4h ago

If anything we now need to unlearn the rigidity - being too formal can make the AI overly focused on certain aspects, and is in general poor UX. You can always tell legacy man-made code because it is extremely inflexible and requires the user to know terminology and usage implicitly lest it break, hard.

For once, as developers we are actually using computers how normal people always wished they worked and were turned away frustratedly. We now need to blend our precise formal approach with these capabilities to make it all actually work the way it always should have.

lolinder · 12h ago

> Meanwhile, software developers spot code fragments seemingly lifted from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.

This kind of guilt-by-association play might be the most common fallacy in internet discourse. None of us are allowed to express outrage at the bulk export of GitHub repos with zero regard for their copyleft status because some members of the software engineering community are large-scale pirates? How is that a reasonable argument to make?

The most obvious problem with this is it's a faulty generalization. Many of us aren't building large-scale piracy sites of any sort. Many of us aren't bulk downloading media of any kind. The author has no clue whether the individual humans making the IP argument against AI are engaged in piracy, so this is an extremely weak way to reject that line of argument.

The second huge problem with this argument is that it assumes that support for IP rights is a blanket yes/no question, which it's obviously not. I can believe fervently that SciHub is a public good and Elsevier is evil and at the same time believe that copyleft licenses placed by a collective of developers on their work should be respected and GitHub was evil to steal their code. Indeed, these two ideas will probably occur together more often than not because they're both founded in the idea that IP law should be used to protect individuals from corporations rather than the other way around.

The author has some valid points, but dismissing this entire class of arguments so flippantly is intellectually lazy.

sfRattan · 11h ago

> The author has some valid points, but dismissing this entire class of arguments so flippantly is intellectually lazy.

Agree 100%. And generally programmers have a poor understanding of the law, especially common law as it applies in America (the country whose legal system most software licenses have been written to integrate with, especially copyleft principles).

American Common Law is an institution and continuity of practice dating back centuries. Everything written by jurists within that tradition, while highly technical, is nonetheless targeted at human readers who are expected to apply common sense and good faith in reading. Where programmers declare something in law insufficiently specified or technically a loophole, the answer is largely: this was written for humans to interpret using human reason, not for computers to compile using limited, literal algorithms.

Codes of law are not computer code and do not behave like computer code.

And following the latest AI boom, here is what the bust will look like:

1. Corporations and the state use AI models and tools in a collective attempt to obfuscate, diffuse, and avoid accountability. This responsibility two-step is happening now.

2. When bad things happen (e.g. a self-driving car kills someone, predictive algorithms result in discriminatory policy, vibe coding results in data leaks and/or cyberattacks), there will be litigation that follows the bad things.

3. The judges overseeing the litigation will not accept that AI has somehow magically diffused and obfuscated all liability out of existence. They will look at the parties at hand, look at relevant precedents, pick out accountable humans, and fine them or---if the bad is bad enough---throw them in cages.

4. Other companies will then look at the fines and the caged humans, and will roll back their AI tools in a panic while they re-discover the humans they need to make accountable, and in so doing fill those humans back in on all the details they pawned off on AI tools.

The AI tools will survive, but in a role that is circumscribed by human accountability. This is how common law has worked for centuries. Most of the strange technicalities of our legal system are in fact immune reactions to attempts made by humans across the centuries to avoid accountability or exploit the system. The law may not be fast, but it will grow an immune response to AI tools and life will go on.

xpe · 9h ago

I agreed with this comment until the second half which is just one scenario - one that is contingent on many things happening in specific ways.

dboreham · 9h ago

In other words: this will probably all end in tears.

agnishom · 8h ago

It's not just "guilt-by-association". It is a much worse reactionary general argument. It can be applied to any kind of moral problem to preserve the status quo.

If this was a legitimate moral argument, we'd never make any social progress.

shiomiru · 2h ago

It's not just lazy, it's nonsense. The author is conflating piracy with plagiarism, even though the two are completely different issues.

Plagiarism is taking somebody else's work and claiming that you yourself created it. It is a form of deception, depriving another of credit while selling their accomplishments as your own.

Piracy on the other hand is the violation of a person's monopoly rights on distributing certain works. This may damage said person's livelihood, but the authorship remains clear.

mattl · 11h ago

I’m a free software developer and have been for over 25 years. I’ve worked at many of the usual places too and I enjoy and appreciate the different licenses used for software.

I’m also a filmmaker and married to a visual artist.

I don’t touch this stuff at all. It’s all AI slop to me. I don’t want to see it, I don’t want to work with it or use it.

xpe · 9h ago

Some people make these kinds of claims for ethical reasons, I get it. But be careful to not confuse one’s ethics with the current state of capability, which changes rapidly. Most people have a tendency to rationalize, and we have to constantly battle it.

Without knowing the commenter above, I’ll say this: don’t assume an individual boycott is necessarily effective. If one is motivated by ethics, I think it is morally required to find effective ways to engage to shape and nudge the future. It is important to know what you’re fighting for (and against). IP protection? Human dignity through work? Agency to effect one’s life? Other aspects? All are important.

taurath · 6h ago

"Morally required to ... engage" with technologies that one disagrees with sounds fairly easily debunk-able to me. Everyone does what they can live with - being up close and personal, in empathy with humans who are negatively effected by a given technology, they can choose to do what they want.

Who knows, we might find out in a month that this shit we're doing is really unsafe and is a really bad idea, and doesn't even work ultimately for what we'd use it for. LLMs already lie and blackmail.

mattl · 8h ago

I run a music community that’s been around for 16 years and many users are asking me what they can do to avoid AI in their lives and I’m going to start building tools to help.

Many of the people pushing for a lot of AI stuff are the same people who have attached their name to a combination of NFTs, Blockchain, cryptocurrency, Web3 and other things I consider to be grifts/scams.

The term “AI” is already meaningless. So let’s be clear: Generative AI (GenAI) is what worries many people including a number of prominent artists.

This makes me feel like there’s work to be done if we want open source/art/the internet as we know it to remain and be available to us in the future.

It drives me a little crazy to see Mozilla adding AI to Firefox instead of yelling about it at every opportunity. Do we need to save them too?

jszymborski · 8h ago

The argument that I've heard against LLMs for code is that they create bugs that, by design, are very difficult to spot.

The LLM has one job, to make code that looks plausible. That's it. There's no logic gone into writing that bit of code. So the bugs often won't be like those a programmer makes. Instead, they can introduce a whole new class of bug that's way harder to debug.

vanschelven · 5h ago

This is exactly what I wrote about when I wrote "Copilot Induced Crash" [0]

Funny story: when I first posted that and had a couple of thousand readers, I had many comments of the type "you should just read the code carefully on review", but _nobody_ pointed out the fact that the opening example (the so called "right code") had the exact same problem as described in the article, proving exactly what you just said: it's hard to spot problems that are caused by plausibility machines.

[0] https://www.bugsink.com/blog/copilot-induced-crash/

intrasight · 8h ago

My philosophy is to let the LLM either write the logic or write the tests - but not both. If you write the tests and it writes the logic and it passes all of your tests, then the LLM did its job. If there are bugs, there were bugs in your tests.

underdeserver · 7h ago

That rather depends on the type of bug and what kinds of tests you would write.

LLMs are way faster than me at writing tests. Just prompt for the kind of test you want.

catlifeonmars · 3h ago

Idk about you but I spend much more time thinking about what ways the code is likely to break and deciding what to test. Actually writing tests is usually straightforward and fast with any sane architecture with good separation of concerns.

I can and do use AI to help with test coverage but coverage is pointless if you don’t catch the interesting edge cases.

dfedbeef · 7h ago

This is pretty nifty, going to try this out!

therealmarv · 4h ago

I don't agree. What I do agree on is to do it not only with one LLM.

Quality increases if I double check code with a second LLM (especially o4 mini is great for that)

Or double check tests the same way.

Maybe even write tests and code with different LLMs if that is your worry.

devjab · 7h ago

For me it's mostly about the efficiency of the code they write. This is because I work in energy where efficiency matters because our datasets are so ridicilously large and every interface to that data is so ridicilously bad. I'd argue that for 95% of the software out there it won't really matter if you use a list or a generator in Python to iterate over data. It probably should and maybe this will change with cloud costs continious increasing, but we do also live in a world where 4chan ran on some apache server running a 10k line php file from 2015...

Anyway, this is where AI's have been really bad for us. As well as sometimes "overengineering" their bug prevention in extremely inefficient ways. The flip-side of this is of course that a lot of human programmers would make the same mistakes.

deanc · 6h ago

I’ve had the opposite experience. Just tell it to optimise for speed and iterate and give feedback. I’ve had JS code optimised specifically for v8 using bitwise operations. It’s brilliant.

mirkodrummer · 4h ago

Example code or it's just a claim :)

mcv · 2h ago

Note that it's a claim in response to another claim. It doesn't need to be held to a higher standard than its parent.

fisherjeff · 7h ago

Yes, exactly - my (admittedly very limited!) experience has consistently generated well-written, working code that just doesn’t quite do what I asked. Often the results will be close to what I expect, and the coding errors do not necessarily jump out on a first line-by-line pass, so if I didn’t have a high degree of skepticism of the generated code in the first place, I could easily just run with it.

otabdeveloper4 · 6h ago

> working code that just doesn’t quite do what I asked

Code that doesn't do what you want isn't "working", bro.

Working exactly to spec is the code's only job.

lukan · 4h ago

It is a bit ambiguous I think, there is also the meaning of "the code compiles/runs without errors". But I also prefer the meaning of, "code that is working to the spec".

alex989 · 7h ago

>Instead, they can introduce a whole new class of bug that's way harder to debug

That sounds like a new opportunity for a startup that will collect hundreds of millions a of dollars, brag about how their new AI prototype is so smart that it scares them, and devliver nothing

DiogenesKynikos · 3h ago

> There's no logic gone into writing that bit of code.

What makes you say that? If LLMs didn't reason about things, they wouldn't be able to do one hundredth of what they do.

mindwok · 7h ago

This is a misunderstanding. Modern LLMs are trained with RL to actually write good programs. They aren't just spewing tokens out.

godelski · 7h ago

No, YOU misunderstand. This isn't a thing RL can fix

  https://news.ycombinator.com/item?id=44163194

  https://news.ycombinator.com/item?id=44068943

It doesn't optimize "good programs". It interprets "humans interpretation of good programs." More accurately, "it optimizes what low paid over worked humans believe are good programs." Are you hiring your best and brightest to code review the LLMs?

Even if you do, it still optimizes tricking them. It will also optimize writing good programs, but you act like that's a well defined and measurable thing.

tptacek · 4h ago

I don't know if any of this applies to the arguments in my article, but most of the point of it is that progress in code production from LLMs is not a consequence of better models (or fine tuning or whatever), but rather on a shift in how LLMs are used, in agent loops with access to ground truth about whether things compile and pass automatic acceptance. And I'm not claiming that closed-loop agents reliably produce mergeable code, only that they've broken through a threshold where they produce enough mergeable code that they significantly accelerate development.

godelski · 2h ago

  > I don't know if any of this applies to the arguments

  > with access to ground truth

There's the connection. You think you have ground truth. No such thing exists

mindwok · 6h ago

This is just semantics. What's the difference between a "human interpretation of a good program" and a "good program" when we (humans) are the ones using it? If the model can write code that passes tests, and meets my requirements, then it's a good programmer. I would expect nothing more or less out of a human programmer.

godelski · 5h ago

Is your grandma qualified to determine what is good code?

  > If the model can write code that passes tests

You think tests make code good? Oh my sweet summer child. TDD has been tried many times and each time it failed worse than the last.

pydry · 4h ago

Good to know something i've been doing for 10 years consistently could never work.

godelski · 2h ago

It's okay, lots of people's code is always buggy. I know people that suck at coding and have been doing it for 50 years. It's not uncommon

I'm not saying don't make tests. But I am saying you're not omniscient. Until you are, your tests are going to be incomplete. They are helpful guides, but they should not drive development. If you really think you can test for every bug then I suggest you apply to be Secretary for health.

https://hackernoon.com/test-driven-development-is-fundamenta...

https://geometrian.com/projects/blog/test_driven_development...

pydry · 53m ago

Ive worked with people who write tests afterwards and it's pretty inevitable that they:

* End up missing tests for edge cases they built and forgot about. Those edge cases often have bugs.

* They forget and cover the same edge cases twice if theyre being thorough with test-after. This is a waste.

* They usually end up spending almost as much time manually testing in the end to verify the code change they just made worked whereas I would typically just deploy straight to prod.

It doesnt prevent all bugs it just prevents enough to make the teams around us who dont do it look bad by comparison even though they do manual checks too.

Ive heard loads of good reasons to not write tests at all, Ive yet to hear a good reason to not write one before if you are going to write one.

Both of your articles raise pretty typical straw men. One is "what if im not sure what the customer wants?" (then you arent writing production code) and the other is the peculiar but common notion that TDD can only be done with a low level unit test which is dangerous bullshit.

otabdeveloper4 · 6h ago

> What's the difference between a "human interpretation of a good program" and a "good program" when we (humans) are the ones using it?

Correctness.

> and meets my requirements

It can't do that. "My requirements" wasn't part of the training set.

mindwok · 6h ago

"Correctness" in what sense? It sounds like it's being expanded to an abstract academic definition here. For practical purposes, correct means whatever the person using it deems to be correct.

> It can't do that. "My requirements" wasn't part of the training set.

Neither are mine, the art of building these models is that they are generalisable enough that they can tackle tasks that aren't in their dataset. They have proven, at least for some classes of tasks, they can do exactly that.

godelski · 5h ago

  > to an abstract academic definition here

Besides the fact that your statement is self contradicting, there is actually a solid definition [0]. You should click the link on specification too. Or better yet, go talk to one of those guys that did their PhD in programming languages.

  > They have proven

Have they?

Or did you just assume?

Yeah, I know they got good scores on those benchmarks but did you look at the benchmarks? Look at the question and look what is required to pass it. Then take a moment and think. For the love of God, take a moment and think about how you can pass those tests. Don't just take a pass at face value and move on. If you do, well I got a bridge to sell you.

[0] https://en.wikipedia.org/wiki/Correctness_(computer_science)

mindwok · 4h ago

Sure,

> In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified.

"As specified" here being the key phrase. This is defined however you want, and ranges from a person saying "yep, behaves as specified", to a formal proof. Modern language language models are trained under RL for both sides of this spectrum, from "Hey man looks good", to formal theorem proving. See https://arxiv.org/html/2502.08908v1.

So I'll return to my original point: LLMs are not just generating outputs that look plausible, they are generating outputs that satisfy (or at least attempt to satisfy) lots of different objectives across a wide range of requirements. They are explicitly trained to do this.

So while you argue over the semantics of "correctness", the rest of us will be building stuff with LLMs that is actually useful and fun.

godelski · 2h ago

You have to actually read more than the first line of a Wikipedia article to understand it

  > formal theorem proving

You're using Coq and Lean?

I'm actually not convinced you read the paper. It doesn't have anything to do with your argument. Someone using LLMs with formal verification systems is wildly different than LLMs being formal verification systems.

This really can't work if you don't read your own sources

otabdeveloper4 · 3h ago

> they are generating outputs that satisfy (or at least attempt to satisfy) lots of different objectives across a wide range of requirements

No they aren't. You were lied to by the hype machine industry. Sorry.

The good news is that there's a lot of formerly intractable problems that can now be solved by generating plausible output. Programming is just not one of them.

mindwok · 1h ago

> No they aren't. You were lied to by the hype machine industry. Sorry.

Ok. My own empirical evidence is in favour of these things being useful, and useful enough to sell their output (partly), but I'll keep in mind that I'm being lied to.

otabdeveloper4 · 1h ago

Quite a huge leap from "these things are useful" to "these things can code".

(And yes, this leap is the lie you're being sold. "LLMs are kinda useful" is not what led to the LLM trillion dollar hype bubble.)

Quarkdown: A modern Markdown-based typesetting system (github.com)

There should be no Computer Art (1971) (dam.org)

Plutonium Mountain: The 17-year mission to guard remains of Soviet nuclear tests (belfercenter.org)

EU Commission refuses to disclose authors behind its mass surveillance proposal (old.reddit.com)

My AI skeptic friends are all nuts (fly.io)

GUIs are built at least 2.5 times (patricia.no)

AI makes the humanities more important, but also weirder (resobscura.substack.com)

A Complete Guide to Meta Prompting (prompthub.us)

Cloudlflare builds OAuth with Claude and publishes all the prompts (github.com)

The Metamorphosis of Prime Intellect (1994) (localroger.com)

Ask HN: Who is hiring? (June 2025)

How to Store Data on Paper? (monperrus.net)

Rsync's defaults are not always enough (rachelbythebay.com)

Demodesk (YC W19) Is Hiring Rails Engineers (demodesk.com)

Fun with Futex (blog.fredrb.com)

A High-Level View of TLA+ (lamport.azurewebsites.net)

Show HN: Kan.bn – An open-source alterative to Trello (github.com)

Implementing a Forth (ratfactor.com)

How to post when no one is reading (jeetmehta.com)

Show HN: A toy version of Wireshark (student project) (github.com)

Show HN: I build one absurd web project every month (absurd.website)

Conformance checking at MongoDB: Testing that our code matches our TLA+ specs (mongodb.com)

Teaching Program Verification in Dafny at Amazon (2023) (dafny.org)

Show HN: Onlook – Open-source, visual-first Cursor for designers (github.com)

Sid Meier's Pirates – In-depth (2017) (shot97retro.blogspot.com)

Magic Ink: Information Software and the Graphical Interface (worrydream.com)

MonsterUI: Python library for building front end UIs quickly in FastHTML apps (answer.ai)

Largest punk archive to find new home at MTSU's Center for Popular Music (mtsunews.com)

ThorVG: Super Lightweight Vector Graphics Engine (thorvg.org)

The Creepy, Surprisingly Routine Business of Animal Cloning (theatlantic.com)

Japanese scientists develop artificial blood compatible with all blood types (tokyoweekender.com)

Younger generations less likely to have dementia, study suggests (theguardian.com)

Ask HN: How do I learn practical electronic repair?

Wendelstein 7-X sets new fusion record (heise.de)

The Princeton INTERCAL Compiler's source code (esoteric.codes)

Typing 118 WPM broke my brain in the right ways (balaji-amg.surge.sh)

Show HN: Penny-1.7B Irish Penny Journal style transfer (huggingface.co)

The rise of judgement over technical skill (notsocommonthoughts.com)

Ubicloud: Open-Source Alternative to AWS (github.com)

CVE 2025 31200 (blog.noahhw.dev)

Ask HN: How do I learn robotics in 2025?

Arcol simplifies building design with browser-based modeling (arcol.io)

Intelligent Agent Technology: Open Sesame! (1993) (blog.gingerbeardman.com)

Ask HN: Who wants to be hired? (June 2025)

If you are useful, it doesn't mean you are valued (betterthanrandom.substack.com)

The Visual World of 'Samurai Jack' (animationobsessive.substack.com)

Is “The Phoenician Scheme” Wes Anderson's Most Emotional Film? (newyorker.com)

ReasoningGym: Reasoning Environments for RL with Verifiable Rewards (arxiv.org)

LibriVox (librivox.org)

The Atomic Airplane (whatisnuclear.com)

My AI skeptic friends are all nuts

Comments (1963)