Generative AI coding tools and agents do not work for me

246 nomdep 285 6/17/2025, 12:33:45 AM blog.miguelgrinberg.com ↗

Comments (285)

socalgal2 · 4h ago
> Another common argument I've heard is that Generative AI is helpful when you need to write code in a language or technology you are not familiar with. To me this also makes little sense.

I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.

Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.

plasticeagle · 3h ago
ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

What do you think will happen when everyone is using the AI tools to answer their questions? We'll be back in the world of Encyclopedias, in which central authorities spent large amounts of money manually collecting information and publishing it. And then they spent a good amount of time finding ways to sell that information to us, which was only fair because they spent all that time collating it. The internet pretty much destroyed that business model, and in some sense the AI "revolution" is trying to bring it back.

Also, he's specifically talking about having a coding tool write the code for you, he's not talking about using an AI tool to answer a question, so that you can go ahead and write the code yourself. These are different things, and he is treating them differently.

socalgal2 · 2h ago
> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

I know this isn't true because I work on an API that has no answers on stackoverflow (too new), nor does it have answers anywhere else. Yet, the AI seems to able to accurately answer many questions about it. To be honest I've been somewhat shocked at this.

bbarnett · 2h ago
It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck.

That doesn't mean it knows the answer. That means it guessed or hallucinated correctly. Guessing isn't knowing.

edit: people seem to be missing my point, so let me rephrase. Of course AIs don't think, but that wasn't what I was getting at. There is a vast difference between knowing something, and guessing.

Guessing, even in humans, is just the human mind statistically and automatically weighing probabilities and suggesting what may be the answer.

This is akin to what a model might do, without any real information. Yet in both cases, there's zero validation that anything is even remotely correct. It's 100% conjecture.

It therefore doesn't know the answer, it guessed it.

When it comes to being correct about a language or API that there's zero info on, it's just pure happenstance that it got it correct. It's important to know the differences, and not say it "knows" the answer. It doesn't. It guessed.

One of the most massive issues with LLMs is we don't get a probability response back. You ask a human "Do you know how this works", and an honest and helpful human might say "No" or "No, but you should try this. It might work".

That's helpful.

Conversely a human pretending it knows and speaking with deep authority when it doesn't is a liar.

LLMs need more of this type of response, which indicates certainty or not. They're useless without this. But of course, an LLM indicating a lack of certainty, means that customers might use it less, or not trust it as much, so... profits first! Speak with certainty on all things!

jumploops · 1h ago
> It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before.

The amazing thing about LLMs is that we still don’t know how (or why) they work!

Yes, they’re magic mirrors that regurgitate the corpus of human knowledge.

But as it turns out, most human knowledge is already regurgitation (see: the patent system).

Novelty is rare, and LLMs have an incredible ability to pattern match and see issues in “novel” code, because they’ve seen those same patterns elsewhere.

Do they hallucinate? Absolutely.

Does that mean they’re useless? Or does that mean some bespoke code doesn’t provide the most obvious interface?

Having dealt with humans, the confidence problem isn’t unique to LLMs…

lechatonnoir · 2h ago
This is such a pointless, tired take.

You want to say this guy's experience isn't reproducible? That's one thing, but that's probably not the case unless you're assuming they're pretty stupid themselves.

You want to say that it Is reproducible, but that "that doesn't mean AI can think"? Okay, but that's not what the thread was about.

rsanheim · 2h ago
It's just Pattern matching. Most APIs, and hell, most code is not unique or special. Its all been done a thousands of times before. Thats why an LLM can be helpful on some tool you've written just for yourself and never released anywhere.

As to 'knows the answer', I'm don't even know what that means with these tools. All I know is if it is helpful or not.

PeterStuer · 2h ago
What would convince you otherwise? The reason I ask is because you sound like you have made up your mind phylosophically, not based on practical experience.
hombre_fatal · 40m ago
This doesn't seem like a useful nor accurate way of describing LLMs.

When I built my own programming language and used it to build a unique toy reactivity system and then asked the LLM "what can I improve in this file", you're essentially saying it "only" could help me because it learned how it could improve arbitrary code before in other languages and then it generalized those patterns to help me with novel code and my novel reactivity system.

"It just saw that before on Stack Overflow" is a bad trivialization of that.

It saw what on Stack Overflow? Concrete code examples that it generalized into abstract concepts it could apply to novel applications? Because that's the whole damn point.

olmo23 · 2h ago
Where does the knowledge come from? People can only post to SO if they've read the code or the documentation. I don't see why LLMs couldn't do that.
nobunaga · 1h ago
ITT: People who think LLMs are AGI and can produce output that the LLM has come up with out of thin air or by doing research. Go speak with someone who is actually an expert in this field how LLMs work and why the training data is so important. Im amazed that people in the CS industry seem to talk like they know everything about a tech after using it but never even writing a line of code for an LLM. Our indsutry is doomed with people like this.
usef- · 1h ago
This isn't about being AGI or not, and it's not "out of thin air".

Modern implementations of LLMs can "do research" by performing searches (whose results are fed into the context), or in many code editors/plugins, the editor will index the project codebase/docs and feed relevant parts into the context.

My guess is they either were using the LLM from a code editor, or one of the many LLMs that do web searches automatically (ie. all of the popular ones).

They are answering non-stackoverflow questions every day, already.

kypro · 1h ago
The idea that LLMs can only spew out text they've been trained on is a fundamental miss-understanding of how modern backprop training algorithms work. A lot of work goes into refining training algorithms to preventing overfitting of the training data.

Generalisation is something that neural nets are pretty damn good at, and given the complexity of modern LLMs the idea that they cannot generalise the fairly basic logical rules and patterns found in code such that they're able provide answers to inputs unseen in the training data is quite an extreme position.

socalgal2 · 4h ago
To add, another experience I had. I was using an API I'm not that familiar with. My program was crashing. Looking at the stack trace I didn't see why. Maybe if I had many months experience with this API it would be obvious but it certainly wasn't to me. For fun I just copy and pasted the stack trace into Gemini. ~60 frames worth of C++. It immediately pointed out the likely cause given the API I was using. I fixed the bug with a 2 line change once I had that clue from the AI. That seems pretty useful to me. I'm not sure how long it would have taken me to find it otherwise since, as I said, I'm not that familiar with that API.
nottorp · 2h ago
You remember when Google used to do the same thing for you way before "AI"?

Okay, maybe sometimes the post about the stack trace was in Chinese, but a plain search used to be capable of giving the same answer as a LLM.

It's not that LLMs are better, it's search that got entshittified.

averageRoyalty · 55m ago
A horse used to get you places just like a car could. A wisk worked as well as a blender.

We have a habit of finding efficiencies in our processes, even if the original process did work.

socalgal2 · 2h ago
I remember when I could paste an error message into Google and get an answer. I do not remember pasting a 60 line stack trace into Google and getting an answer, though I'm pretty sure I honestly never tried that. Did it work?
jasode · 2h ago
>You remember when Google used to do the same thing for you way before "AI"? [...] stack trace [...], but a plain search used to be capable of giving the same answer as a LLM.

The "plain" Google Search before LLM never had the capability to copy&paste an entire lengthy stack trace (e.g. ~60 frames of verbose text) because long strings like that exceeds Google's UI. Various answers say limit of 32 words and 5784 characters: https://www.google.com/search?q=limit+of+google+search+strin...

Before LLM, the human had to manually visually hunt through the entire stack trace to guess at a relevant smaller substring and paste that into Google the search box. Of course, that's do-able but that's a different workflow than an LLM doing it for you.

To clarify, I'm not arguing that the LLM method is "better". I'm just saying it's different.

Philpax · 2h ago
Google has never identified the logical error in a block of code for me. I could find what an error code was, yes, but it's of very little help when you don't have a keyword to search.
nsonha · 1h ago
> when Google used to do the same thing for you way before "AI"?

Which is never? Do you often just lie to win arguments? LLM gives you a synthesized answer, search engine only returns what already exists. By definition it can not give you anything that is not a super obvious match

nottorp · 38m ago
> Which is never?

In my experience it was "a lot". Because my stack traces were mostly hardware related problems on arm linux in that period.

But I suppose your stack traces were much different and superior and no one can have stack traces that are different from yours. The world is composed of just you and your project.

> Do you often just lie to win arguments?

I do not enjoy being accused of lying by someone stuck in their own bubble.

When you said "Which is never" did you lie consciously or subconsciously btw?

FranzFerdiNaN · 2h ago
It was just as likely that Google would point you towards a stackoverflow question that was closed because it was considered a duplicate of a completely different question.
BlackFly · 1h ago
One of the many ways that search got worse over time was the promotion of blog spam over actual documentation. Generally, I would rather have good API documentation or a user guide that leads me through the problem so that next time I know how to help myself. Reading through good API documentation often also educates you about the overall design and associated functionality that you may need to use later. Reading the manual for technology that you will be regularly using is generally quite profitable.

Sometimes, a function doesn't work as advertised or you need to do something tricky, you get a weird error message, etc. For those things, stackoverflow could be great if you could find someone who had a similar problem. But the tutorial level examples on most blogs might solve the immediate problem without actually improving your education.

It would be similar to someone solving your homework problems for you. Sure you finished your homework, but that wasn't really learning. From this perspective, ChatGPT isn't helping you learn.

blueflow · 1h ago
You parent searches for answers, you search for documentation. Thats why AI works for him and not for you.
ryanackley · 13m ago
You're completely missing his point. If nobody figures things out for themselves, there's a risk that at some point, AI won't have anything to learn on since people will stop writing blog posts on how they figured something out and answering stack overflow questions.

Sure, there is a chance that one day AI will be smart enough to read an entire codebase and chug out exhaustively comprehensive and accurate documentation. I'm not convinced that is guaranteed to happen before our collective knowledge falls off a cliff.

turtlebits · 4h ago
It's perfect for small boilerplate utilities. If I need a browser extension/tampermonkey script, I can get up and running quickly without having to read docs/write manifests. These are small projects where without AI, I wouldn't have bothered to even start.

At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.

PeterStuer · 2h ago
I love leaning new things. With ai I am learning more and faster.

I used to be on the Microsoft stack for decades. Windows, Hyper-V, .NET, SQL Server ... .

Got tired of MS's licensing BS and I made the switch.

This meant learning Proxmox, Linux, Pangolin, UV, Python, JS, Bootstrap, NGinx, Plausible, SQLite, Postgress ...

Not all of these were completely new, but I had never dove in seriously.

Without AI, this would have been a long and daunting project. AI made this so much smoother. It never tires of my very basic questions.

It does not always answer 100% correct the first time (tip: paste in the docs of specific version of the thing you are trying to figure out as it sometimes has out-of-date or mixed version knowledge), but most often can be nudged and prodded to a very helpfull result.

AI is just an undeniably superior teacher than Google or Stack Overflow ever was. You still do the learning, but the AI is great in getting you to learn.

greybox · 2h ago
I trust chatgpt and gemini a lot less than stackoverflow. On stackoverflow I can see the context that the answer to the original question was given in. AI does not do this. I've asked chatgpt questions about cmake for instance that it got subtly wrong, if I had not noticed this it would have cost me aa lot of time.
thedelanyo · 3h ago
So AI is basically best as a search engine.
groestl · 3h ago
That's right.
cess11 · 3h ago
I mean, it's just a compressed database with a weird query engine.
nikanj · 3h ago
And ChatGPT never closes your question without answer because it (falsely) thinks it's a duplicate of a different question from 13 years ago
nottorp · 2h ago
But it does give you a ready to copy paste answer instead of a 'teach the man how to fish' answer.
addandsubtract · 2h ago
Not if you prompt it to explain the answer it gives you.
nottorp · 2h ago
Not the same thing. Copying code, even with comprehensive explanations, teaches less than writing/adjusting your own code based on advice.
nikanj · 2h ago
I'd rather have a copy paste answer than a "go fish" answer
yard2010 · 1h ago
I think the main issue here is trust. When you google something you develop a sense for bullshit so you can "feel" the sources and weigh them accordingly. Using a chat bot, this bias doesn't hold, so you don't know what is just SEO bullshit reiterated in sweet words and what's not.
rwmj · 1h ago
> What I think happens is that these people save time because they only spot review the AI generated code, or skip the review phase altogether, which as I said above would be a deal breaker for me.

In my experience it's that they dump the code into a pull request and expect me to review it. So GenAI is great if someone else is doing the real work.

waprin · 8h ago
To some degree, traditional coding and AI coding are not the same thing, so it's not surprising that some people are better at one than the other. The author is basically saying that he's much better at coding than AI coding.

But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".

It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

dspillett · 2h ago
> To some degree, traditional coding and AI coding are not the same thing

LLM-based¹ coding, at least beyond simple auto-complete enhancements (using it directly & interactively as what it is: Glorified Predictive Text) is more akin to managing a junior or outsourcing your work. You give a definition/prompt, some work is done, you refine the prompt and repeat (or fix any issues yourself), much like you would with an external human. The key differences are turnaround time (in favour of LLMs), reliability (in favour of humans, though that is mitigated largely by the quick turnaround), and (though I suspect this is a limit that will go away with time, possibly not much time) lack of usefulness for "bigger picture" work.

This is one of my (several) objections to using it: I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it. For years I've avoided managing people at all, at the known expense of reduced salary potential, for similar reasons: I want to be a tinkerer, not a manager of tinkerers. Perhaps call me back when you have an AGI that I can work alongside.

--------

[1] Yes, I'm a bit of a stick-in-the-mud about calling these things AI. Next decade they won't generally be considered AI like many things previously called AI are not now. I'll call something AI when it is, or very closely approaches, AGI.

rwmj · 1h ago
Another difference if your junior will, over time, learn, and you'll also get a sense of whether you can trust them. If after a while they aren't learning and you can't trust them, you get rid of them. GenAI doesn't gain knowledge in the same way, and you're always going to have the same level of trust in it (which in my experience is limited).

Also if my junior argued back and was wrong repeatedly, that's be bad. Lucky that has never happened with AIs ...

averageRoyalty · 53m ago
Cline, Roocode etc have the concept of rules that can be added to over time. There are heaps of memory bank and orchestration methods for AI.

LLMs absolutely can improve over time.

mitthrowaway2 · 6h ago
The skill ceiling might be "high" but it's not like investing years of practice to become a great pianist. The most experienced AI coder in the world has about three years of practice working this way, much of which is obsoleted because the models have changed to the point where some lessons learned on GPT 3.5 don't transfer. There aren't teachers with decades of experience to learn from, either.
freehorse · 2h ago
Moreover, the "ceiling" may still be below the "code works" level, and you have no idea when you start if it is or not.
dr_dshiv · 3h ago
It’s mostly attitude that you are learning. Playfulness, persistence and a willingness to start from scratch again and again.
suddenlybananas · 3h ago
>persistence and a willingness to start from scratch again and again.

i.e. continually gambling and praying the model spits something out that works instead of thinking.

tsurba · 2h ago
Gambling is where I end up if I’m tired and try to get an LLM to build my hobby project for me from scratch in one go, not really bothering to read the code properly. It’s stupid and a waste of time. Sometimes it’s easier to get started this way though.

But more seriously, in the ideal case refining a prompt based on a misunderstanding of an LLM due to ambiguity in your task description is actually doing the meaningful part of the work in software development. It is exactly about defining the edge cases, and converting into language what is it that you need for a task. Iterating on that is not gambling.

But of course if you are not doing that, but just trying to get a ”smarter” LLM with (hopefully deprecated study of) ”prompt engineering” tricks, then that is about building yourself a skill that can become useless tomorrow.

chii · 59m ago
why is the process important? If they can continuously trial and error their way into a good output/result, then it's a fine outcome.
suddenlybananas · 21m ago
Why is thinking important? Think about it a bit.
HPsquared · 3h ago
Most things in life are like that.
notnullorvoid · 6h ago
Is it a skill worth learning though? How much does the output quality improve? How transferable is it across models and tools of today, and of the future?

From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.

vidarh · 3h ago
Given I see people insisting these tools don't work for them at all, and some of my results recently include spitting out a 1k line API client with about 5 brief paragraphs of prompts, and designing a website (the lot, including CSS, HTML, copy, database access) and populating the directory on it with entries, I'd think the output quality improves a very great deal.

From what I see of the tools, I think the skills developed largely consists of skills you need to develop as you get more senior anyway, namely writing detail-oriented specs and understanding how to chunk tasks. Those skills aren't going to stop having value.

serpix · 6h ago
Regarding using AI tools for programming it is not a one-for-all choice. You can pick a grunt work task such as "Tag every such and such terraform resource with a uuid" and let it do just that. Nothing to do with quality but everything to do with a simple task and not having to bother with the tedium.
autobodie · 5h ago
Why use AI to do something so simple? You're only increasing the possibility that it gets done wrong. Multi-cursor editing wil be faster anyway.
barsonme · 4h ago
Why not? I regularly have a couple Claude instances running in the background chewing through simple yet time consuming tasks. It’s saved me many hours of work and given me more time to focus on the important parts.
dotancohen · 3h ago

  > a couple Claude instances running in the background chewing through simple yet time consuming tasks.
If you don't mind, I'd love to hear more about this. How exactly are they running the background? What are they doing? How do you interact with them? Do they have access to your file system?

Thank you!

Philpax · 2h ago
I would guess that they're running multiple instances of Claude Code [0] in the background. You can give it arbitrary tasks up to a complexity ceiling that you have to figure out for yourself. It's a CLI agent, so you can just give it directives in the relevant terminal. Yes, they have access to the filesystem, but only what you give them.

[0]: https://www.anthropic.com/claude-code

dotancohen · 1h ago
Those tasks can take hours, or at least long enough where multiple tasks are running in the background? The page says $17 per month. That's unlimited usage?

If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.

stitched2gethr · 4h ago
It will very soon be the only way.
skydhash · 8h ago
> But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize

No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.

furyofantares · 6h ago
> No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.

I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.

I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.

Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.

oxidant · 8h ago
I do not agree it is something you can pick up in an hour. You have to learn what AI is good at, how different models code, how to prompt to get the results you want.

If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?

I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.

AI isn't a panacea, but it can be the right tool for the job.

15123123 · 6h ago
I am also interested in how much of these skills are at the mercy of OpenAI ? Like IIRC 1 or 2 years ago there was an uproar of AI "artists" saying that their art is ruined because of model changes ( or maybe the system prompt changed ).

>I do not agree it is something you can pick up in an hour.

But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).

>You have to learn what AI is good at.

More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.

oxidant · 5h ago
Of course that's what the industry is selling because they want to make money. Yes, it's easy to create a proof of concept but once you get out of greenfield into 50-100k tokens needed in the context (reading multiple 500 line files, thinking, etc) the quality drops and you need to know how to focus the models to maintain the quality.

"Write me a server in Go" only gets you so far. What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?

I find I need to think AND write more than I would if I was doing it myself because the feedback loop is longer. Like the article says, you have to review the code instead of having implicit knowledge of what was written.

That being said, it is faster for some tasks, like writing tests (if you have good examples) and doing basic scaffolding. It needs quite a bit of hand holding which is why I believe those with more experience get more value from AI code because they have a better bullshit meter.

solumunus · 6h ago
OpenAI? They are far from the forefront here. No one is using their models for this.
15123123 · 4h ago
You can substitute for whatever saas company of your choice.
viraptor · 6h ago
> It's something you can pick in a few minutes

You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?

Correct prompting these days what makes a difference in tasks like SWE-verified.

sothatsit · 6h ago
I feel like there is also a very high ceiling to how much scaffolding you can produce for the agents to get them to work better. This includes custom prompts, custom CLAUDE.md files, other documentation files for Claude to read, and especially how well and quickly your linting and tests can run, and how much functionality they cover. That's not to mention MCP and getting Claude to talk to your database or open your website using Playwright, which I have not even tried yet.

For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.

This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.

And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.

freehorse · 2h ago
And where all this skill will go when newer models after one year use different tools and require different scaffolding?

The problem with overinvesting in a brand new, developping field is that you get skills that are soon to be redundant. You can hope that the skills are gonna transfer to what will be needed after, but I am not sure if that will be the case here. There was a lot of talk about prompting techniques ("prompt engineering") last year, and now most of these are redundant and I really don't think I have learnt something that is useful enough for the new models, nor have I actually understood sth. These are all tricks and tips level, shallow stuff.

I think these skills are just like learning how to use some tools in an ide. They increase productivity, it's great but if you have to switch ide they may not actually help you with the new things you have to learn in the new environment. Moreover, these are just skills in how to use some tools; they allow you to do things, but we cannot compare learning how to use tools vs actually learning and understanding the structure of a program. The former is obviously a shallow form of knowledge/skill, easily replaceable, easily redundant and probably not transferable (in the current context). I would rather invest more time in the latter and actually get somewhere.

sothatsit · 10m ago
A lot of the changes to get agents to work well is just good practice anyway. That's what is nice about getting these agents to work well - often, it just involves improving your dev tooling and documentation, which can help real human developers as well. I don't think this is going to become irrelevant any time soon.

The things that will change may be prompts or MCP setups or more specific optimisations like subagents. Those may require more consideration of how much you want to invest in setting them up. But the majority of setup you do for Claude Code is not only useful to Claude Code. It is useful to human developers and other agent systems as well.

> There was a lot of talk about prompting techniques ("prompt engineering") last year and now most of these are redundant.

Not true, prompting techniques still matter a lot to a lot of applications. It's just less flashy now. In fact, prompting techniques matter a ton for optimising Claude Code and creating commands like the planning prompt I created. It matters a lot when you are trying to optimise for costs and use cheaper models.

> I think these skills are just like learning how to use some tools in an ide. > if you have to switch ide they may not actually help you

A lot of the skills you learn in one IDE do transfer to new IDEs. I started using Eclipse and that was a steep learning curve. But later I switched to IntelliJ IDEA and all I had to re-learn were key-bindings and some other minor differences. The core functionality is the same.

Similarly, a lot of these "agent frameworks" like Claude Code are very similar in functionality, and switching between them as the landscape shifts is probably not as large of a cost as you think it is. Often it is just a matter of changing a model parameter or changing the command that you pass your prompt to.

Of course it is a tradeoff, and that tradeoff probably changes a lot depending upon what type of work you do, your level of experience, how old your codebases are, how big your codebases are, the size of your team, etc... it's not a slam dunk that it is definitely worthwhile, but it is at least interesting.

viraptor · 6h ago
> then it will produce a specification document from that

I like a similar workflow where I iterate on the spec, then convert that into a plan, then feed that step by step to the agent, forcing full feature testing after each one.

bcrosby95 · 6h ago
When you say specification, what, specifically, does that mean? Do you have an example?

I've actually been playing around with languages that separate implementation from specification under the theory that it will be better for this sort of stuff, but that leaves an extremely limited number of options (C, C++, Ada... not sure what else).

I've been using C and the various LLMs I've tried seem to have issues with the lack of memory safety there.

sothatsit · 5h ago
A "specification" as in a text document outlining all the changes to make.

For example, it might include: Overview, Database Design (Migration, Schema Updates), Backend Implementation (Model Updates, API updates), Frontend Implementation (Page Updates, Component Design), Implementation Order, Testing Considerations, Security Considerations, Performance Considerations.

It sounds like a lot when I type it out, but it is pretty quick to read through and edit.

The specification document is generated by a planning prompt that tells Claude to analyse the feature description (the couple paragraphs I wrote), research the repository context, research best practices, present a plan, gather specific requirements, perform quality control, and finally generate the planning document.

I'm not sure if this is the best process, but it seems to work pretty well.

viraptor · 5h ago
Like a spec you'd hand to a contractor. List of requirements, some business context, etc. Not a formal algorithm spec.

My basic initial prompt for that is: "we're creating a markdown specification for (...). I'll start with basic description and at each step you should refine the spec to include the new information and note what information is missing or could use refinement."

sagarpatil · 6h ago
Yeah, you can’t do sh*t in an hour. I spend a good 6-8 hours every day using Claude Code, and I actually spend an hour every day trying new AI tools, it’s a constant process.

Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.

__MatrixMan__ · 7h ago
It definitely takes more than minutes to discover the ways that your model is going to repeatedly piss you off and set up guardrails to mitigate those problems.
JimDabell · 5h ago
> It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up).

This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.

dingnuts · 8h ago
> You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.

qinsig · 8h ago
Avoid using agents that can just blow through money (cline, roocode, claudecode with API key, etc).

Instead you can get comfortable prompting and managing context with aider.

Or you can use claude code with a pro subscription for a fair amount of usage.

I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.

goalieca · 8h ago
And how often do your prompting skills change as the models evolve.
badsectoracula · 6h ago
It wont be the hippest of solutions, but you can use something like Devstral Small with a full open source setup to get experimenting with local LLMs and a bunch of tools - or just chat with it with a chat interface. I did pingponged between Devstral running as a chat interface and my regular text editor some time ago to make a toy project of a raytracer [0] (output) [1] (code).

While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).

It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.

IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).

[0] https://i.imgur.com/FevOm0o.png

[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...

[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things

[3] the code it produced for [2] was fixable to do a simple BVH

[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.

grogenaut · 8h ago
how much time did you spend learning your last language to become comfortable with it?
stray · 8h ago
You're going to spend a little over $1k to ramp up your skills with AI-aided coding. It's dirt cheap in the grand scheme of things.
viraptor · 6h ago
Not even close. I'm still under $100, creating full apps. Stick to reasonable models and you can achieve and learn a lot. You don't need latest and greatest in max mode (or whatever the new one calls it) for majority of the tasks. You can have to throw the whole project at the service every time either.
dingnuts · 8h ago
do I get a refund if I spend a grand and I'm still not convinced? at some point I'm going to start lying to myself to justify the cost and I don't know how much y'all earn but $1k is getting close
theoreticalmal · 7h ago
Would you ask for a refund from a university class if you didn’t get a job or skill from it? Investing in a potential skill is a risk and carries an opportunity cost, that’s part of what makes it a risk
HDThoreaun · 6h ago
No one is forcing you to improve. If you don’t want to invest in yourself that is fine, you’ll just be left behind.
asciimov · 7h ago
How are those without that kind of scratch supposed to keep up with those that do?
theoreticalmal · 6h ago
This kind of seems like asking “how are poor people supposed to keep up with rich people” which we seem to not have a long term viable answer for right now
wiseowise · 6h ago
What makes you think those without that kind of scratch are supposed to keep up?
asciimov · 5h ago
For the past 10 years we have been telling everyone learn to code, now it’s learn to build AI prompts.

Before a poor kid with a computer access could learn to code nearly for free, but if it costs $1k just to get started with AI that poor kid will never have that opportunity.

wiseowise · 4h ago
For the past 10 years scammers and profiteers been telling everyone to learn to code, not we.
sagarpatil · 6h ago
Use free tiers?
throwawaysleep · 6h ago
If you lack "that kind of scratch", you are at the learning stage for software development, not the keeping up stage. Either that or horribly underpaid.
bevr1337 · 6h ago
I recently had a coworker tell me he liked his last workplace because "we all spoke the same language." It was incredible how much he revealed about himself with what he thought was a simple fact about engineer culture. Your comment reminds me of that exchange.

- Employers, not employees, should provide workplace equipment or compensation for equipment. Don't buy bits for the shop, nails for the foreman, or Cursor for the tech lead.

- the workplace is not a meritocracy. People are not defined by their wealth.

- If $1,000 does not represent an appreciable amount of someone's assets, they are doing well in life. Approximately half of US citizens cannot afford rent if they lose a paycheck.

- Sometimes the money needs to go somewhere else. Got kids? Sick and in the hospital? Loan sharks? A pool full of sharks and they need a lot of food?

- Folks can have different priorities and it's as simple as that

We're (my employer) still unsure if new dev tooling is improving productivity. If we find out it was unhelpful, I'll be very glad I didn't lose my own money.

15123123 · 6h ago
$100 per month for a SaaS is quite a lot outside of Western countries. People are not even spending that much on VPN or Password Manager.

No comments yet

lexandstuff · 7h ago
Great article. The other thing that you miss out on when you don't write the code yourself is that sense of your subconscious working for you. Writing code has a side benefit of developing a really strong mental model of a problem, that kinda gets embedded in your neurons and pays dividends down the track, when doing stuff like troubleshooting or deciding on how to integrate a new feature. You even find yourself solving problems in your sleep.

I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

nerevarthelame · 6h ago
> I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.

But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.

[1] https://storage.googleapis.com/gweb-research2023-media/pubto...

wiseowise · 6h ago
You’re twisting results. Just because they took more time doesn’t mean their productivity went down. On the contrary, if you can perform expert task with much less mental resources (which 99% of orgs should prioritize for) then it is an absolute win. Work is extremely mentally draining and soul crushing experience for majority of people, if AI can lower that while maintaining roughly same result with subjects allocating only, say, 25% of their mental energy – that’s an amazing win.
didibus · 5h ago
If I follow what you are saying, employers won't see any benefits, but employees, while they will take the same time and create the same output in the same amount of time, will be able to do so at a reduced mental strain?

Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.

Vinnl · 1h ago
Steelmanning their argument, employers will see benefits because while the employee might be more productive than with an LLM in the first two hours of the day, the cognitive load reduces their productivity as the day goes on. If employees are able to function at a higher level for longer during their day with an LLM, that should benefit the employer.
tsurba · 2h ago
And how long have you been doing this? Because that sounds naive.

After doing programming for a decade or two, the actual act of programming is not enough to be ”creative problem solving”, it’s the domain and set of problems you get to apply it to that need to be interesting.

>90% of programming tasks at a company are usually reimplementing things and algorithms that have been done a thousand times before by others, and you’ve done something similar a dozen times. Nothing interesting there. That is exactly what should and can now be automated (to some extent).

In fact solving problems creatively to keep yourself interested, when the problem itself is boring is how you get code that sucks to maintain for the next guy. You should usually be doing the most clear and boring implementation possible. Which is not what ”I love coding” -people usually do (I’m definitely guilty).

To be honest this is why I went back to get a PhD, ”just coding” stuff got boring after a few years of doing it for a living. Now it feels like I’m just doing hobby projects again, because I work exactly on what I think could be interesting for others.

jumploops · 9h ago
> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

As someone who uses Claude Code heavily, this is spot on.

LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.

I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).

adriand · 8h ago
Do you have to review the code? I’ll be honest that, like the OP theorizes, I often just spot review it. But I also get it to write specs (often very good, in terms of the ones I’ve dug into), and I always carefully review and test the results. Because there is also plenty of non-AI code in my projects I didn’t review at all, namely, the myriad open source libraries I’ve installed.
jumploops · 8h ago
Yes, I’m actually working on an another project with the goal of never looking at the code.

For context, it’s just a reimplementation of a tool I built.

Let’s just say it’s going a lot slower than the first time I built it by hand :)

hatefulmoron · 7h ago
It depends on what you're doing. If it's a simple task, or you're making something that won't grow into something larger, eyeballing the code and testing it is usually perfect. These types of tasks feel great with Claude Code.

If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.

The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.

vidarh · 2h ago
If you let it paint you into a corner, why are you doing so?

I don't know what you mean by "a lot of guidance". Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.

Another issue is that as long as you ensure it builds good enough tests, the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.

The code is increasingly becoming throwaway.

hatefulmoron · 1h ago
> If you let it paint you into a corner, why are you doing so?

What do you mean? If it were as simple as not letting it do so, I would do as you suggest. I may as well stop letting it be incorrect in general. Lots of guidance helps avoid it.

> Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.

Well yeah. You need to give it lots of guidance, like someone who works for you.

> the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.

It's a moving target for sure. My confidence with this in more complex scenarios is much smaller.

vidarh · 1h ago
> What do you mean? If it were as simple as not letting it do so, I would do as you suggest.

I'm arguing it is as simple as that. Don't accept changes that muddle up the architecture. Take attempts to do so as evidence that you need to add direction. Same as you presumably would - at least I would - with a developer.

hatefulmoron · 1h ago
My concern isn't that it's messing up my architecture as I scream in protest from the other room, powerless to stop it. I agree with you and I think I'm being quite clear. Without relatively close guidance, it will paint you into a corner in terms of architecture. Guide it, direct it, whatever you want to call it.
cbsmith · 8h ago
There's an implied assumption here that code you write yourself doesn't need to be reviewed from a context different from the author's.

There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".

Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).

It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.

Terr_ · 6h ago
There's an implied assumption here that developers who end up spending all their time reviewing LLM code won't lose their skills or become homicidal. :p
cbsmith · 6h ago
Fair enough. ;-)

I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.

Terr_ · 5h ago
I expect that comes from the contrast and synthesis between how the author is anticipating things will develop or be explained, versus what the other person actually provided and trying to understand their thought process.

What happens if the reader no longer has enough of that authorial instinct, their own (opinionated) independent understanding?

I think the average experience would drift away from "I thought X was the obvious way but now I see by doing Y you were avoid that other problem, cool" and towards "I don't see the LLM doing anything too unusual compared to when I ask it for things, LGTM."

cbsmith · 4h ago
It seems counter intuitive that the reader would no longer have that authorial instinct due to lack of writing. Like, maybe they never had it, in which case, yes. But being exposed to a lot of different "writing opinions" tends to hone your own.

Let's say you're right though, and you lose that authorial instinct. If you've got five different proposals/PRs from five different models, each one critiqued by the other four, the needs for authorial instinct diminish significantly.

layer8 · 1h ago
I don’t find this convincing. People generally don’t learn how to write a good novel just by reading a lot of them.
sagarpatil · 6h ago
I always use Claude Code to debug issues, there’s no point in trying to do this yourself when AI can fix it in minutes (easy to verify if you write tests first) o3 with new search can do things in 5 mins that will take me at least 30 mins if I’m very efficient. Say what you want but the time savings is real.
layer8 · 1h ago
Tests can never verify the correctness of code, they only spot-check for incorrectness.
susshshshah · 6h ago
How do you know what tests to write if you don’t understand the code?
9rx · 6h ago
Same way you normally would? Tests are concerned with behaviour. The code that implements the behaviour is immaterial.
wiseowise · 5h ago
How do you do TDD without having code in the first place? How do QA verifies without reading the source?
adastra22 · 5h ago
I’m not sure I understand this statement. You give your program parameters X and expect result Y, but instead get Z. There is your test, embedded in the problem statement.
mleonhard · 8h ago
I solved my RSI symptoms by keeping my arms warm all the time, while awake or asleep. Maybe that will work for you, too?
jumploops · 8h ago
My issue is actually due to ulnar nerve compression related to a plate on my right clavicle.

Years of PT have enabled me to work quite effectively and minimize the flare ups :)

hooverd · 8h ago
Is anybody doing cool hybrid interfaces? I don't actually want to do everything in conversational English, believe it or not.
jumploops · 8h ago
My workflow is to have spec files (markdown) for any changes I’m making, and then use those to keep Claude on track/pull out of the trees.

Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.

I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.

adastra22 · 5h ago
I do similar and find that this is the best compromise that I have tried. But I still find myself nodding along with OP. I am more and more finding that this is not actually faster, even though it certainly seems so.
bdamm · 8h ago
Isn't that what Windsurf or Cursor are?
marssaxman · 6h ago
So far as I can tell, generative AI coding tools make the easy part of the job go faster, without helping with the hard part of the job - in fact, possibly making it harder. Coding just doesn't take that much time, and I don't need help doing it. You could make my coding output 100x faster without materially changing my overall productivity, so I simply don't bother to optimize there.
Jonovono · 6h ago
Are you a plumber perhaps?
kevinventullo · 3h ago
I’m not sure I follow the question. I think of plumbing as being the exact kind of verbose boilerplate that LLM’s are quite good at automating.

In contrast, when I’m trying to do something truly novel, I might spend days with a pen and paper working out exactly what I want to do and maybe under an hour coding up the core logic.

On the latter type of work, I find LLM’s to be high variance with mostly negative ROI. I could probably improve the ROI by developing a better sense of what they are and aren’t good at, but of course that itself is rapidly changing!

worik · 6h ago
I am.

That is the mental model I have for the work (computer programing) i like to do and am good at.

Plumbing

tptacek · 6h ago
I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.

The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.

But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.

Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.

That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.

If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".

sensanaty · 2h ago
But how is this a more efficient way of working? What if you have to have it open 30 PRs before 1 of them is acceptable enough to not outright ignore? It sounds absolutely miserable, I'd rather review my human colleague's work because in 95% of cases I can trust that it's not garbage.

The alternative where I boil a few small lakes + a few bucks in return for a PR that maybe sometimes hopefully kinda solves the ticket sounds miserable. I simply do not want to work like that, and it doesn't sound even close to efficient or speedier or anything like that, we're just creating extra work and extra waste for literally no reason other than vague marketing promises about efficiency.

kasey_junk · 17m ago
If you get to 2 or 3 and it hasn’t done what you want you fall back to writing it yourself.

But in my experience this is _signal_. If the ai cant get to it with minor back and forth then something needs work, your understanding, the specification, the tests, your code factoring etc.

The best case scenario is your agent one shots the problem. But close behind that is that your agent finds a place where a little cleanup makes everybody’s life easier you, your colleagues and the bot. And your company is now incentivized to invest in that.

The worse case is you took the time to write 2 prompts that didn’t work.

smaudet · 6h ago
I guess my challenge is that "if it was a rote recitation of an idiomatic go function", was it worth writing?

There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.

The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.

And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...

Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.

sesm · 2h ago
I would put it differently: when you already have a mental model of what the code is supposed to do and how, then reviewing is easy: just check that the code conforms to that model.

With an arbitrary PR from a colleague or security audit, you have to come up with mental model first, which is the hardest part.

tptacek · 5h ago
Yes. More things should be rote recitations. Rote code is easy to follow and maintain. We get in trouble trying to be clever (or DRY) --- especially when we do it too early.

Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.

smaudet · 5h ago
> We get in trouble trying to be clever (or DRY)

Certainly, however:

> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work

The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".

My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.

vidarh · 3h ago
> The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

You're missing the point.

tptacek is saying he isn't the one who needs to fix the issue because he can just reject the PR and either have the AI agent refine it or start over. Or ultimately resort to writing the code himself.

He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.

> My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code.

There's a vast chasm between simple enough that a non-AI code generator can generate it using templates and simple enough that a fast read-through is enough to show that it's okay to run.

As an example, the other day I had my own agent generate a 1kloc API client for an API. The worst case scenario other than failing to work would be that it would do something really stupid, like deleting all my files. Since it passes its tests, skimming it was enough for me to have confidence that nowhere does it do any file manipulation other than reading the files passed in. For that use, that's sufficient since it otherwise passes the tests and I'll be the only user for some time during development of the server it's a client for.

But no template based generator could write that code, even though it's fairly trivial - it involved reading the backend API implementation and rote-implementation of a client that matched the server.

smaudet · 2h ago
> But no template based generator could write that code, even though it's fairly trivial

Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...

> He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.

I think you are missing the point as well, that's still review, that's still being on the hook.

Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.

But I hear you say "all software works like that", well, yes, to some degree. The difference being, one you hopefully actually wrote and have some idea what's going wrong, the other one?

Well, you just have to sort of hope it works and when it doesn't, well you said it yourself. Your code was garbage anyways, time to "kill" it and generate some new slop...

vidarh · 1h ago
> Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...

Where is this template based code generator that can read my code, understand it, and generate a full client including a CLI, that include knowing how to format the data, and implement the required protocols?

I'm 30 years of development, I've seen nothing like it.

> I think you are missing the point as well, that's still review, that's still being on the hook.

I don't know if you're being intentionally obtuse, or what, but while, yes, you're on the hook for the final deliverable, you're not on the hook for fixing a specific instance of code, because you can just throw it away and have the AI do it all over.

The point you seem intent on missing is that the cost of throwing out the work of another developer is high, while the cost of throwing out the work of an AI assistant is next to nothing, and so where you need to carefully review a co-workers code because throwing it away and starting over from scratch is rarely an option, with AI generated code you can do that at the slightest whiff of an issue.

> Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.

No, they are not a problem at all. They point to a difference in opportunity cost. If the rate at which you kill code is too high, it's a problem irrespective of source. But the point is that this rate can be much higher for AI code than for co-workers before it becomes a problem, because the cost of starting over is orders of magnitude different, and this allows for a very different way of treating code.

> Well, you just have to sort of hope it works and when it doesn't

No, I don't "hope it works" - I have tests.

kenjackson · 6h ago
I can read code much faster than I can write it.

This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.

globnomulous · 4h ago
> I can read code much faster than I can write it.

I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.

I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.

autobodie · 5h ago
I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.

I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.

kenjackson · 4h ago
I only generate the code once with GenAI and typically fix a bug or two - or at worst use its structure. Rarely do I toss a full PR.

It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

omnicognate · 2h ago
The problem is that at this stage we mostly just have people's estimates of their own success to go on, and nobody thinks they're incompetent. Nobody's going to say "AI works really well for, me but I just pump out dross my colleagues have to fix" or "AI doesn't work for me but I'm an unproductive, burnt out hack pretending I'm some sort of craftsman as the world leaves me behind".

This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.

So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.

dagw · 3h ago
It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.

greybox · 1h ago
For simple tedious or rote tasks, I have templates bound to hotkeys in my IDE. They even come with configurable variable sections that you can fill in afterwards, or base on some highlighted code before hitting the hot key. Also, its free
stitched2gethr · 4h ago
Why would you review agent generated code any differently than human generated code?
tptacek · 4h ago
Because you don't care about the effort the agent took and can just ask for a do-over.
112233 · 6h ago
This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.

I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.

vidarh · 3h ago
Auto-complete means having to babysit it.

The more I use this, the longer the LLM will be working before I even look at the output any more than maybe having it chug along on another screen and occasionally glance over.

My shortest runs now usually takes minutes of the LLM expanding my prompt into a plan, writing the tests, writing the code, linting its code, fixing any issues, and write a commit message before I even review things.

tptacek · 6h ago
I don't find this to be the case. I've used (and hate) autocomplete-style LLM code generation. But I can feed 10 different tasks to Codex in the morning and come back and pick out the 3-4 I think might be worth pursuing, and just re-prompt the 7 I kill. That's nothing like interactive autocomplete, and drastically faster than than I could work without LLM assistance.
bluefirebrand · 5h ago
> Obviously right — accept.

I don't think code is ever "obviously right" unless it is trivially simple

monero-xmr · 6h ago
I mostly just approve PRs because I trust my engineers. I have developed a 6th sense for thousand-line PRs and knowing which 100-300 lines need careful study.

Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.

"Ship it!" - me

theK · 5h ago
I think this points out the crux of the difference of collaborating with other devs vs collaborating with am AI. The article correctly States that the AI will never learn your preferences or idiosyncrasies of the specific projects/company etc because it effectively is amnesic. You cannot trust the AI the same you trust other known collaborators because you don't have a real relationship with it.
loandbehold · 3h ago
Most AI coding tools are working on this problem. E.g. say with Claude Code you can add your preferences to claude.md file. When I notice repeatedly correcting AI's mistake I add instruction to claude.md to avoid it in the future. claude.md is exactly that: memory of your preferences, idiosyncrasies and other project-related info.
vidarh · 3h ago
I do something to the effect of "Update LLM.md with what you've learned" at the end of every session, coupled with telling it what is wrong when I reject a change. It works. It could work better, but it works.
autobodie · 5h ago
Haha, doing this with AI will bury you in a very deep hole.
roxolotl · 8h ago
> But interns learn and get better over time. The time that you spend reviewing code or providing feedback to an intern is not wasted, it is an investment in the future. The intern absorbs the knowledge you share and uses it for new tasks you assign to them later on.

This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.

viraptor · 6h ago
You don't have to reexplain the business context. Save it to the mdc file if it's important. The added benefit is that the next real person looking at the code can also use that to learn - it's actually cool for having good up to date documentation is now an asset.
adastra22 · 5h ago
Do you find your agent actually respecting the mdc file? I don’t.
viraptor · 4h ago
There should be no difference between the mdc and the text in the prompt. Try something drastic like "All of responses should be in Chinese". If it doesn't happen, they're not included correctly. Otherwise, yeah, they work modulo the usual issues of prompt adherence.
adastra22 · 3h ago
I suspect that Cursor is summarizing the context window, and the .mdc directives are the first thing on the chopping room floor.
xarope · 8h ago
I think this is how certain LLMs end up with 14k worth of system prompts
Terr_ · 6h ago
"Be fast", "Be Cheap", "Be Good".

*dusts off hands* Problem solved! Man, am I great at management or what?

freeone3000 · 8h ago
Put the business context in the system prompt.
ukprogrammer · 38m ago
> “It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.”

There’s your issue, the skill of programming has changed.

Typing gets fast; so does review once robust tests already prove X, Y, Z correctness properties.

With the invariants green, you get faster at grokking the diff, feed style nits back into the system prompt, and keep tuning the infinite tap to your taste.

pSYoniK · 3h ago
I've been reading these posts for the past few months and the comments too. I've tried Junie a bit and I've used ChatGPT in the past for some bash scripts (which, for the most part, did what they were supposed to do), but I can't seem to find the use case.

Using them for larger bits of code feels silly as I find subtle bugs or subtle issues in places, so I don't necessarily feel comfortable passing in more things. Also, large bits of code I work with are very business logic specific and well abstracted, so it's hard to try and get ALL that context into the agent.

I guess what I'm trying to ask here is what exactly do you use agents for? I've seen youtube videos but a good chunk of those are people getting a bunch of typescript generated and have some front-end or generate some cobbled together front end that has Stripe added in and everyone is celebrating as if this is some massive breakthrough.

So when people say "regular tasks" or "rote tasks" what do you mean? You can't be bothered to write a db access method/function using some DB access library? You are writing the same regex testing method for the 50th time? You keep running into the same problem and you're still writing the same bit of code over and over again? You can't write some basic sql queries?

Also not sure about others, but I really dislike having to do code reviews when I am unable to really gauge the skill of the dev I'm reviewing. If I know I have a junior with 1-2 years maybe, then I know to focus a lot on logic issues (people can end up cobbling toghether the previous simple bits of code) and if it's later down the road at 2-5 years then I know that I might focus on patterns or look to ensure that the code meets the standards, look for more discreet or hidden bugs. With an agent output it could oscilate wildly between those. It could be a solidly written search function, well optimized or it could be a nightmarish sql querry that's impossible to untangle.

Thoughts?

I do have to say I found it good when working on my own to get another set of "eyes" and ask things like "are there more efficient ways to do X" or "can you split this larger method into multiple ones" etc

zacksiri · 56m ago
LLMs are relatively new technology. I think it's important to recognize the tool for what it is and how it works for you. Everyone is going to get different usage from these tools.

What I personally find is. It's great for helping me solve mundane things. For example I'm recently working on an agentic system and I'm using LLMs to help me generate elasticsearch mappings.

There is no part of me that enjoy making json mappings, it's not fun nor does it engage my curiosity as a programmer, I'm also not going to learn much from generating elasticsearch mappings over and over again. For problems like this, I'm happy to just let the LLM do the job. I throw some json at it and I've got a prompt that's good enough that it will spit out results deterministically and reliably.

However if I'm exploring / coding something new, I may try letting the LLM generate something. Most of the time though in these cases I end up hitting 'Reject All' after I've seen what the LLM produces, then I go about it in my own way, because I can do better.

It all really depends on what the problem you are trying to solve. I think for mundane tasks LLMs are just wonderful and helps get out of the way.

If I put myself into the shoes of a beginner programmer LLMs are amazing. There is so much I could learn from them. Ultimately what I find is LLMs will help lower the barrier of entry to programming but does not mitigate the need to learn to read / understand / reason about the code. Beginners will be able to go much further on their own before seeking out help.

If you are more experienced you will probably also get some benefits but ultimately you'd probably want to do it your own way since there is no way LLMs will replace experienced programmer (not yet anyway).

I don't think it's wise to completely dismiss LLMs in your workflow, at the same time I would not rely on it 100% either, any code generated needs to be reviewed and understood like the post mentioned.

danieltanfh95 · 8h ago
AI models are fundamentally trained on patterns from existing data - they learn to recognize and reproduce successful solution templates rather than derive solutions from foundational principles. When faced with a problem, the model searches for the closest match in its training experience rather than building up from basic assumptions and logical steps.

Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.

Even when explicitly prompted to use first-principles analysis, AI models can struggle because:

- They lack the intuitive understanding of when to discard prior assumptions

- They don't naturally distinguish between surface-level similarity and deep structural similarity

- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics

This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.

Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.

adastra22 · 5h ago
So are people. People are trained on existing data and learn to reproduce known solutions. They also take this to the meta level—a scientist or engineer is trained on methods for approaching new problems which have yielded success in the past. AI does this too. I’m not sure there is actually a distinction here..
danieltanfh95 · 13m ago
Of course there is. Humans can pattern match as a means to save time. LLM pattern match as the only mode of communication and “thought”.

Humans are also not as susceptible to context poisoning, unlike llms.

dvt · 7h ago
I'm actually quite bearish on AI in the generative space, but even I have to admit that writing boilerplate is "N" times faster using AI (use your favorite N). I hate when people claim this without any proof, so literally today this is what I asked ChatGPT:

    write a stub for a react context based on this section (which will function as a modal):
    ```
        <section>
         // a bunch of stuff
        </section>
    ```
Worked great, it created a few files (the hook, the provider component, etc.), and I then added them to my project. I've done this a zillion times, but I don't want to do it again, it's not interesting to me, and I'd have to look up stuff if I messed it up from memory (which I likely would, because provider/context boilerplate sucks).

Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)

Winsaucerer · 6h ago
This kind of thing is my main use, boilerplate stuff And for scripts that I don't care about -- e.g., if I need a quick bash script to do a once off task.

For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.

redhale · 50m ago
This line by the author, in response to one of the comments, betrays the core of the article imo:

> The quality of the code these tools produce is not the problem.

So even if an AI could produce code of a quality equal to or surpassing the author's own code quality, they would still be uninterested in using it.

To each their own, but it's hard for me to accept an argument that such an AI would provide no benefit, even if one put priority on maintaining high quality standards. I take the point that the human author is ultimately responsible, but still.

ritz_labringue · 1h ago
AI is really useful when you already know what code needs to be written. If you can explain it properly, the AI will write it faster than you can and you'll save time because it is quick to check that this is actually the code you wanted to write. So "programming with AI" means programming in your mind and then using the AI to materialize it in the codebase.
kachapopopow · 7h ago
AI is a tool like any other, you have to learn to use it.

I had AI create me a k8s device plugin for supporting sr-iov only vGPU's. Something nvidia calls "vendor specific" and basically offers little to not support for in their public repositories for Linux KVM.

I loaded up a new go project in goland, opened up Junie, typed what I needed and what I have, went to make tea, came back, looked over the code to make sure it wasn't going to destroy my cluster (thankfully most operations were read-only), deployed it with the generated helm chart and it worked (nearly) first try.

Before this I really had no idea how to create device plugins other than knowing what they are and even if I did, it would have easily taken me an hour or more to have something working.

The only thing AI got wrong is that the virtual functions were symlinks and not directories.

The entire project is good enough that I would consider opensourcing it. With 2 more prompts I had configmap parsing to initialize virtual functions on-demand.

No comments yet

jpcrs · 3h ago
I use AI daily, currently paying for Claude Code, Gemini and Cursor. It really helps me on my personal toy projects, it’s amazing at getting a POC running and validate my ideas.

My company just had internal models that were mediocre at best, but at the beginning this year they finally enabled Copilot for everyone.

At the beginning I was really excited for it, but it’s absolutely useless for work. It just doesn’t work on big old enterprise projects. In an enterprise environment everything is composed of so many moving pieces, knowledge scattered across places, internal terminology, etc. Maybe in the future, with better MCP servers or whatever, it’ll be possible to feed all the context into it to make it spit something useful, but right now, at work, I just use AI as search engine (and it’s pretty good at it, when you have the knowledge to detect when it have subtle problems)

HPsquared · 3h ago
I think a first step for these big enterprise codebases (also applicable to documentation) is to collect it into a big ball and finetune on it.
frankc · 8h ago
I just don't agree with this. I am generally telling the model how to do the work according to an architecture I specify using technology I understand. The hardest part for me in reviewing someone else's code is understanding their overall solution and how everything fits together as it's not likely to be exactly the way I would have structured the code or solved the problem. However, with an LLM it generally isn't since we have pre-agreed upon a solution path. If that is not what is happening than likely you are letting the model get too far ahead.

There are other times when I am building a stand-alone tool and am fine wiht whatever it wants to do because it's not something I plan to maintain and its functional correctness is self-evident. In that case I don't even review what it's doing unless it's stuck. This is more actual vibe code. This isn't something I would do for something I am integrating into a larger system but will for something like a cli tool that I use to enhance my workflow.

ken47 · 8h ago
You can pre-agree on a solution path with human engineers too, with a similar effect.
bigbuppo · 6h ago
Don't try to argue with those using AI coding tools. They don't interact well with actual humans, which is why they've been relegated to talking to the computer. We'll eventually have them all working on some busy projects to help with "marketing" to keep them distracted while the decent programmers that can actually work in a team environment can get back to useful work free of the terrible programmers and marketing departments.
wiseowise · 5h ago
> that can actually work in a team environment can get back to useful work free of the terrible programmers

Is that what you and your buddies talk about at two hour long coffee/smoke breaks while “terrible” programmers work?

didibus · 6h ago
You could argue that AI-generated code is a black box, but let's adjust our perspective here. When was the last time you thoroughly reviewed the source code of a library you imported? We already work with black boxes daily as we evaluate libraries by their interfaces and behaviors, not by reading every line.

The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.

The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.

I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?

materielle · 5h ago
I’m not sure that the library comparison really works.

Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.

Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.

Who is accountable for the code that AI writes?

layer8 · 1h ago
Would you use a library that was written by AI without anyone having supervised it and thoroughly reviewed the code? We are using libraries without checking its source code because of the human thought process and quality control that has gone into it, and existing reputation. Nobody would use a library that no one else has ever seen and whose source code no human has ever laid their eyes on. (Excluding code generated by deterministic vetted tools here, like transpilers or parser generators.)
bluefirebrand · 5h ago
> When was the last time you thoroughly reviewed the source code of a library you imported?

Doesn't matter, I'm not responsible for maintaining that particular code

The code in my PRs has my name attached, and I'm not trusting any LLM with my name

didibus · 4h ago
Exactly, that's what I'm saying. Commit AI code under its own name. Then the code under your name can use the AI code as a black box. If your code that uses AI code works as expected, it is similar to when using libraries.

If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.

benediktwerner · 3h ago
I don't see what that does. The AI hardly cares about it's reputation and I also can't really blame the AI when my boss or a customer asks me why something failed, so what does committing under its name do?

I'm ultimately still responsible for the code. And unlike AI, library authors but their and their libraries reputation on the line.

adastra22 · 5h ago
These days, I review external dependencies pretty thoroughly. I did not use to. This is because of AI slop though.
zmmmmm · 8h ago
I think there's a key context difference here in play which is that AI tools aren't better than an expert on the language and code base that is being written. But the problem is that most software isn't written by such experts. It's written by people with very hazy knowledge of the domain and only partial knowledge of the languages and frameworks they are using. Getting it to be stylistically consistent or 100% optimal is far from the main problem. In these contexts AI is a huge help, I find.
aryehof · 4h ago
These days, many programmers and projects are happy to leave testing and defect discovery to end users, under the guise of “but we have unit tests and CI”. That’s exacerbated when using LLM driven code with abandon.

The author is one who appears unwilling to do so.

zengyue · 50m ago
I think it is more suitable for creation rather than modification, so when repeated attempts still don't work, I will delete it and let it rewrite, which often solves the problem.
Kiro · 3h ago
> I believe people who claim that it makes them faster or more productive are making a conscious decision to relax their quality standards to achieve those gains.

Yep, this is pretty much it. However, I honestly feel that AI writes so much better code than me that I seldom need to actually fix much in the review, so it doesn't need to be as thorough. AI always takes more tedious edge-cases into account and applies best practices where I'm much sloppier and take more shortcuts.

ed_mercer · 8h ago
> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

Hard disagree. It's still way faster to review code than to manually write it. Also the speed at which agents can find files and the right places to add/edit stuff alone is a game changer.

Winsaucerer · 6h ago
There's a difference between reviewing code by developers you trust, and reviewing code by developers you don't trust or AI you don't trust.

Although tbh, even in the worse case I think I am still faster at reviewing than writing. The only difference is though, those reviews will never have had the same depth of thought and consideration as when I write the code myself. So reviews are quicker, but also less thorough/robust than writing for me.

bluefirebrand · 5h ago
> also less thorough/robust than writing for me.

This strikes me as a tradeoff I'm absolutely not willing to make, not when my name is on the PR

sensanaty · 1h ago
I'm fast at reviewing PRs because I know the person on the other end and can trust that they got things correctly. I'll focus on the meaty, tricky parts of their PR, but I can rest assured that they matched the design, for example, and not have to verify every line of CSS they wrote.

This is a recipe for disaster with AI agents. You have to read every single line carefully, and this is much more difficult for the large majority of people out there than if you had written it yourself. It's like reviewing a Junior's work, except I don't mind reviewing my Junior colleague's work because I know they'll at least learn from the mistakes and they're not a black box that just spews bullshit.

__loam · 8h ago
You are probably not being thorough enough.
animex · 5h ago
I write mostly boilerplate and I'd rather have the AI do it. The AI is also slow, which is great, which allows me to run 2 or 3 AI workspaces working on different tickets/problems at the same time.

Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.

I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.

nottorp · 2h ago
> The problem is that I'm going to be responsible for that code, so I cannot blindly add it to my project and hope for the best.

Responsability and "AI" marketing are two non intersecting sets.

lvl155 · 49m ago
Analogous to assembly, we need standardized AI language/styles.
royal__ · 8h ago
I get confused when I see stances like this, because it gives me the sense that maybe people just aren't using coding tools efficiently.

90% of my usage of Copilot is just fancy autocomplete: I know exactly what I want, and as I'm typing out the line of code it finishes it off for me. Or, I have a rough idea of the syntax I need to use a specific package that I use once every few months, and it helps remind me what the syntax is, because once I see it I know it's right. This usage isn't really glamorous, but it does save me tiny bits of time in terms of literal typing, or a simple search I might need to do. Articles like this make me wonder if people who don't like coding tools are trying to copy and paste huge blocks of code; of course it's slower.

kibibu · 8h ago
My experience is that the "fancy autocomplete" is a focus destroyer.

I know what function I want to write, start writing it, and then bam! The screen fills with ghost text that may partly be what I want but probably not quit.

Focus shifts from writing to code review. I wrest my attention back to the task at hand, type some more, and bam! New ghost text to distract me.

Ever had the misfortune of having a conversation with a sentence-finisher? Feels like that.

Perhaps I need to bind to a hot key instead of using the default always-on setting.

---

I suspect people using the agentic approaches skip this entirely and therefore have a more pleasant experience overall.

atq2119 · 6h ago
It's fascinating how differently people's brains work.

Autocomplete is a total focus destroyer for me when it comes to text, e.g. when writing a design document. When I'm editing code, it sometimes trips me up (hitting tab to indent but end up accepting a suggestion instead), but without destroying my focus.

I believe your reported experience, but mine (and presumably many others') is different.

skydhash · 8h ago
That usage is the most disruptive for me. With normal intellisense and a library you're familiar with, you can predict the completion and just type normally with minimal interruption. With no completion, I can just touch type and fix the errors after the short burst. But having whole lines pop up break that flow state.

With unfamiliar syntax, I only needs a few minutes and a cheatsheet to get back in the groove. Then typing go back to that flow state.

Typing code is always semi-unconscious. Just like you don't pay that much attention to every character when you're writing notes on paper.

Editing code is where I focus on it, but I'm also reading docs, running tests,...

noiv · 3h ago
I've started to finish some abandoned half-ready side projects with Claude Pro on Desktop with filesystem MCP. Used to high quality code, it took me some time to teach Claude to follow conventions. Now it works like a charm, we work on a requirements.md until all questions are answered and then I let Claude go. Only thing left is convincing clients to embrace code assistents.
edg5000 · 5h ago
It's a bit like going from assembly to C++, except we don't have good rigid rules for high-level program specification. If we had a rigid "high-level language" to express programs, orders or magnitude more high-level than C++ and other, than we could maybe evaluate it for correctness and get 100% output reliability, perhaps. All the languages I picked up, I picked them up when they were at least 10 years old. I'm trying to use AI a bit these days for programming, but it feels like what it must have felt like using C++ when it just came available; promising but not usable (yet?) for most programming situations.
karl11 · 7h ago
There is an important concept alluded to here around skin in the game: "the AI is not going to assume any liability if this code ever malfunctions" -- it is one of the issues I see w/ self-driving cars, planes, etc. If it malfunctions, there is no consequence for the 'AI' (no skin in the game) but there are definitely consequences for any humans involved.
Zaylan · 3h ago
I've had a similar experience. These tools are pretty helpful for small scripts or quick utility code, but once you're working on something with a more complex structure and lots of dependencies, they tend to slow down. Sometimes it takes more effort to fix what they generate than to just write it myself.

I still use them, but more as a support tool than a real assistant.

No comments yet

handfuloflight · 8h ago
Will we be having these conversations for the next decade?
wiseowise · 5h ago
It’s the new “I use Vim/Emacs/Ed over IDE”.
ken47 · 8h ago
Longer.
adventured · 8h ago
The conversations will climb the ladder and narrow.

Eventually: well, but, the AI coding agent isn't better than a top 10%/5%/1% software developer.

And it'll be that the coding agents can't do narrow X thing better than a top tier specialist at that thing.

The skeptics will forever move the goal posts.

jdbernard · 8h ago
If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.

However, assuming we are still having this conversation, that alone is proof to me that the AI is not that capable. We're several years into "replace all devs in six months." We will have to continue wait and see it try and do.

ukprogrammer · 30m ago
> If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.

This. The dev's outcompeting by using AI today are too busy shipping, rather than wasting time writing blog posts about what ultimately, is a skill-issue.

wiseowise · 5h ago
> If the AI actually outperforms humans in the full context of the work, then no, we won't.

IDEs outperform any “dumb” editor in full context of work. You don’t see any less posts about “I use Vim, btw” (and I say this as Vim user).

afarviral · 6h ago
This has been my experience as well, but there are plenty of assertions here that are not always true, e.g. "AI coding tools are sophisticated enough (they are not) to fix issues in my projects" … but how do you know this if you are not constantly checking whether the tooling has improved? I think for a certain level of issue AI can tackle it and improve things, but there's only a subset of the available models and of a multitude of workflows that will work well, but unfortunately we are drowning in many that are mediocre at best and many like me give up before finding the winning combination.
layer8 · 55m ago
You omitted “with little or no supervision”, which I think is crucial to that quote. It’s pretty undisputed that having an AI fix issues in your code requires some amount of supervision that isn’t negligible. I.e. you have to review the fixes, and possibly make some adjustments.
Aeolun · 3h ago
> The part that I enjoy the most about working as a software engineer is learning new things, so not knowing something has never been a barrier for me.

To me the part I enjoy most is making things. Typing all that nonsense out is completely incidental to what I enjoy about it.

s_ting765 · 2h ago
Author makes very good points. Someone has to be responsible for the AI generated code, and if it's not going to be you then no one should feel obligated to pull the auto-generated PR.
fshafique · 8h ago
"do not work for me", I believe, is the key message here. I think a lot of AI companies have crafted their tools such that adoption has increased as the tools and the output got better. But there will always be a few stragglers, non-normative types, or situations where the AI agent is just not suitable.
lexandstuff · 7h ago
Maybe, but there's also some evidence that AI coding tools aren't making anyone more productive. One study from last year found that there was no increase in developer velocity but a dramatic increase in bugs.[1] Granted, the technology has advanced since this study, but many of the fundamental issues of LLM unreliability remain. Additionally, a recent study has highlighted the significant cognitive costs associated with offloading problem-solving onto LLMs, revealing that individuals who do so develop significantly weaker neural connectivity than those who don't [2].

It's very possible that AI is literally making us less productive and dumber. Yet they are being pushed by subscription-peddling companies as if it is impossible to operate without them. I'm glad some people are calling it out.

[1] https://devops.com/study-finds-no-devops-productivity-gains-...

[2] https://arxiv.org/abs/2506.08872

fshafique · 5h ago
One year ago I probably would've said the same. But I started dabbling with it recently, and I'm awed by it.
b0a04gl · 7h ago
clarity is exactly why ai tools could work well for anyone. they're not confused users , they know what they want and that makes them ideal operators of these systems. if anything, the friction they're seeing isn't proof the tools are broken, it's proof the interface is still too blunt. you can't hand off intent without structure. but when someone like uses ai with clean prompts, tight scope, and review discipline, the tools usually align. it's not either-or. the tools aren't failing them, they're underutilissed.
edg5000 · 5h ago
A huge bottleneck seems the lack of memory between sessions, at least with Claude Code. Sure, I can write things into a text file, but it's not the same as having an AI actually remember the work done earlier.

Is this possible in any way today? Does one need to use Llama or DeepSeek, and do we have to run it on our own hardware to get persistence?

block_dagger · 8h ago
> For every new task this "AI intern" resets back to square one without having learned a thing!

I guess the author is not aware of Cursor rules, AGENTS.md, CLAUDE.md, etc. Task-list oriented rules specifically help with long term context.

adastra22 · 4h ago
Do they? I have found that with Cursor at least, the model very quickly starts ignoring rules.
stray · 8h ago
You can lead a horse to the documentation, but you can't make him think.
wiseowise · 5h ago
Think is means to an end, not the end goal.

Or are you talking about OP not knowing AI tools enough?

joelthelion · 4h ago
I think it's getting clear that, in the current stage, Ai coding agent are mostly useful for people working either on small projects, or isolated new features. People who maintain a large framework find it less useful.
sagarpatil · 7h ago
What really baffles me is the claims from: Anthropic: 80% of the code is generated by AI OpenAI: 70-80% Google/Microsoft: 30%
layer8 · 46m ago
Microsoft and Google have the much larger and older code bases.
root_axis · 5h ago
The use of various AI coding tools is so diffuse that there isn't even a practical way to measure this. You can be assured those numbers are more or less napkin math based on some arbitrary AI performance factor applied to the total code writing population of the company.
nojs · 6h ago
This does not contradict the article - it may be true, and yet not significantly more productive, because of the increased review burden.
nilirl · 2h ago
The main claim made: When there's money or reputation to be lost, code requires the same amount of cognition; irrespective of who wrote the code, AI or not.

Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.

The real problem: Disinformation. Needless extrapolation, poor analogies, over valuing anecdotes.

But there's money to be made. What can we do, sometimes the invisible hand slaps us silly.

freehorse · 2h ago
> Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.

Counter counter claim for these use cases: when I do proof of concept, I actually want to increase my understanding of said concept at the same time, learn challenges involved, and in general get a better idea how feasible things are. An AI can be useful for asking questions, asking for reviews, alternative solutions, inspiration etc (it may have something interesting to add or not) but if we are still in the territory "this matters" I would rather not substitute the actual learning experience and deeper understanding with having an AI generate code faster. Similar for hobby projects, do I need that thing to just work or I actually care to learn how it is done? If the learning/understanding is not important in a context, I would say then using AI to generate the code is a great time-saver. Otherwise, I may still use AI but not in the same way.

nilirl · 1h ago
Fair. I rescind those examples and revise my counter: When you gain much more from speed than you lose with errors, AI makes sense.

Revised example: Software where the goal is design experimentation; like with trying out variations of UX ideas.

euleriancon · 7h ago
> The truth that may be shocking to some is that open source contributions submitted by users do not really save me time either, because I also feel I have to do a rigorous review of them.

This truly is shocking. If you are reviewing every single line of every package you intend to use how do you ever write any code?

adastra22 · 5h ago
That’s not what he said. He said he reviews every line of every pull request he receives to his own projects. Wouldn’t you?
abenga · 6h ago
You do not need to review every line of every package you use, just the subset of the interface you import/link and use. You have to review every line of code you commit into your project. I think attempting to equate the two is dishonest dissembling.
euleriancon · 6h ago
To me, the point the friend is making is, just like you said, that you don't need to review every line of code in a package, just the interface. The author misses the point that there truly is code that you trust without seeing it. At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state. As AI code becomes more reliable, it will likely become the case that you only need to read the subset of the interface you import/link and use.
bluefirebrand · 5h ago
This absolutely is intrinsic to the workflow

Using a package that hundreds of thousands of other people use is low risk, it is battle tested

It doesn't matter how good AI code gets, a unique solution that no one else has ever touched is always going to be more brittle and risky than an open source package with tons of deployments

And yes, if you are using an Open Source package that has low usage, you should be reviewing it very carefully before you embrace it

Treat AI code as if you were importing from a git repo with 5 installs, not a huge package with Mozilla funding

root_axis · 5h ago
> At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state

This remains to be seen. It's still early days, but self-attention scales quadratically. This is a major red flag for the future potential of these systems.

cinbun8 · 3h ago
As someone who heavily utilizes AI for writing code, I disagree with all the points listed. AI is faster, a multiplier, and in many instances, the equivalent of an intern. Perhaps the code it writes is not like the code written by humans, but it serves as a force multiplier. Cursor makes $500 million for a reason.
p1dda · 7h ago
It would be interesting to see which is faster/better in competitive coding, the human coder or the human using AI to assist in coding.
Snuggly73 · 4h ago
New benchmark for competitive coding dropped yesterday - https://livecodebenchpro.com/

Apparently models are not doing great for problems out of distribution.

p1dda · 2h ago
It goes to show that the LLMs aren't intelligent in the way humans are. LLMs are a really great replacement for googling though
wiseowise · 4h ago
It already happened. Last year AI submissions completely destroyed AoC, as far as I remember.
asciimov · 7h ago
It would only be interesting if the problem was truly novel. If the AI has already been trained on the problem it’ll just push out a solution.
skydhash · 8h ago
I do agree with these points in my situation. I don't actually care for speed or having generated snippets for unfamiliar domains. Coding for me has always be about learning. Whether I'm building out a new feature or solving a bug, programming is always a learning experience. The goal is to bring forth a solution that a computer can then perform, but in the process you learn about how and more importantly why you should solve a problem.

The concept of why can get nebulous in a corporate setting, but it's nevertheless fun to explore. At the end of the day, someone have a problem and you're the one getting the computer to solve it. The process of getting there is fun in a way that you learn about what irks someone else (or yourself).

Thinking about the problem and its solution can be augmented with computers (I'm not remembering Go Standard Library). But computers are simple machines with very complex abstractions built on top of them. The thrill is in thinking in terms of two worlds, the real one where the problem occurs and the computing one where the solution will come forth. The analogy may be more understandable to someone who've learned two or more languages and think about the nuances between using them to depict the same reality.

Same as the TFA, I'm spending most of my time manipulating a mental model of the solution. When I get to code is just a translation. But the mental model is difuse, so getting it written gives it a firmer existence. LLMs generation is mostly disrupting the process. The only way they help really is a more pliable form of Stack Overflow, but I've only used Stack Overflow as human-authored annotations of the official docs.

dpcan · 7h ago
This article is just simply not true for most people who have figured out how to use AI properly when coding. Since switching to Cursor, my coding speed and efficiency has probably increased 10x conservatively. When I'm using it to code in languages I've used for 25+ years, it's a breeze to look over the function it just saved me time by pre-thinking and typing it out for me. Could I have done it myself, yeah, but it would have taken longer if I even had to go lookup one tiny thing in the documentation, like order of parameters for a function, or that little syntax thing I never use...

Also, the auto-complete with tools like Cursor are mind blowing. When I can press tab to have it finish the next 4 lines of a prepared statement, or it just knows the next 5 variables I need to define because I just set up a function that will use them.... that's a huge time saver when you add it all up.

My policy is simple, don't put anything AI creates into production if you don't understand what it's doing. Essentially, I use it for speed and efficiency, not to fill in where I don't know at all what I'm doing.

amlib · 6h ago
What do you even mean with a 10x increase in efficiency? Does that means you commit 10x more code every day? Or that "you" essentially "type" code 10x faster? In the later case all the other tasks surrounding code would still take around the same netting you much less than 10x increase in overall productivity, probably less than 2x?
asciimov · 7h ago
Out of curiosity how much are you spending on AI?

How much do you believe a programmer needs to layout to “get good”?

epiccoleman · 6h ago
I am currently subscribed to Claude Pro, which is $20/mo and gives you plenty to experiment with by giving you access to Projects and MCP in Claude Desktop and also Claude Code for a flat monthly fee. (I think there are usage limits but I haven't hit them).

I've probably fed $100 in API tokens into the OpenAI and Anthropic consoles over the last two years or so.

I was subscribed to Cursor for a while too, though I'm kinda souring on it and looking at other options.

At one point I had a ChatGPT pro sub, I have found Claude more valuable lately. Same goes for Gemini, I think it's pretty good but I haven't felt compelled to pay for it.

I guess my overall point is you don't have to break the bank to try this stuff out. Shell out the $20 for a month, cancel immediately, and if you miss it when it expires, resub. $20 is frankly a very low bar to clear - if it's making me even 1% more productive, $20 is an easy win.

bilalq · 6h ago
I've found "agents" to be an utter disappointment in their current state. You can never trust what they've done and need to spend so much time verifying their solution that you may as well have just done it yourself in the first place.

However, AI code reviewers have been really impressive. We run three separate AI reviewers right now and are considering adding more. One of these reviewers is kind of noisy, so we may drop it, but the others have been great. Sure, they have false positives sometimes and they don't catch everything. But they do catch real issues and prevent customer impact.

The Copilot style inline suggestions are also decent. You can't rely on it for things you don't know about, but it's great at predicting what you were going to type anyway.

hooverd · 8h ago
The moat is that juniors, never having worked without these tools, provide revenue to AI middlemen. Ideally they're blasting their focus to hell on short form video and stimulants, and are mentally unable to do the job without them.
Terr_ · 6h ago
Given some the creeping appeal of LLMs as cheating tools in education, some of them may be arriving in the labor market with their brains already cooked.
nreece · 6h ago
Heard someone say the other day "AI coding is just advanced scaffolding right now." Made me wonder if we're expecting too much out of it, at-least for now.
nurettin · 5h ago
I simply don't like the code it writes. Whenever I try using llms, it is like wrestling for conciseness. Terrible code which is almost certainly 1/10 error or "extras" I don't need. At this point I am simply using it to motivate me to move forward.

Writing a bunch of orm code feels boring? I make it generate the code and edit. Importing data? I just make it generate inserts. New models are good at reformatting data.

Using a third party Library? I force it to look up every function doc online and it still has errors.

Adding transforms and pivots to sql while keeping to my style? It is a mess. Forget it. I do that by hand.

bdamm · 8h ago
No offense intended, but this is written by a guy who has the spare time to write the blog. I can only assume his problem space is pretty narrow. I'm not sure what his workflow is like, but personally I am interacting with so many different tools, in so many different environments, with so many unique problem sets, that being able to use AIs for error evaluation, and yes, for writing code, has indeed been a game changer. In my experience it doesn't replace people at all, but they sure are powerful tools. Can they write unsupervised code? No. Do you need to read the code they write? Yes, absolutely. Can the AIs produce bugs that take time to find? Yes.

But despite all that, the tools can find problems, get information, and propose solutions so much faster and across such a vast set of challenges that I simply cannot imagine going back to working without them.

This fellow should keep on working without AIs. All the more power to him. And he can ride that horse all the way into retirement, most likely. But it's like ignoring the rise of IDEs, or Google search, or AWS.

ken47 · 8h ago
> rise of IDEs, or Google search, or AWS.

None of these things introduced the risk of directly breaking your codebase without very close oversight. If LLMs can surpass that hurdle, then we’ll all be having a different conversation.

stray · 7h ago
A human deftly wielding an LLM can surpass that hurdle. I laugh at the idea of telling Claude Code to do the needful and then blindly pushing to prod.
bdamm · 8h ago
This is not the right way to look at it. You don't have to have the LLMs directly coding your work unsupervised to see the enormous power that is there.

And besides, not all LLMs are the same when it comes to breaking existing functions. I've noticed that Claude 3.7 is far better at not breaking things that already work than whatever it is that comes with Cursor by default, for example.

wiseowise · 4h ago
Literally everything in this list, except AWS, introduces risk of breaking your code base without close oversight. Same people who copy paste LLM code into IDEs are yesterday’s copy paste from SO and random Google searches.
satisfice · 8h ago
You think he's not using the tools correctly. I think you aren't doing your job responsibly. You must think he isn't trying very hard. I think you are not trying very hard...

That is the two sides of the argument. It could only be settled, in principle, if both sides were directly observing each other's work in real-time.

But, I've tried that, too. 20 years ago in a debate between dedicated testers and a group of Agilists who believed all testing should be automated. We worked together for a week on a project, and the last day broke down in chaos. Each side interpreted the events and evidence differently. To this day the same debate continues.

worik · 6h ago
There are tasks I find AI (I use DeepSeek) useful for

I have not found it useful for large programming tasks. But for small tasks, a sort of personalised boiler plate, I find it useful

globnomulous · 4h ago
Decided to post my comment here rather than on the author's blog. Dang and tonhow, if the tone is too personal or polemical, I apologize. I don't think I'm breaking any HN rules.

Commenter Doug asks:

> > what AI coding tools have you utilized

Miguel replies:

> I don't use any AI coding tools. Isn't that pretty clear after reading this blog post?

Doug didn't ask what tools you use, Miguel. He asked which tools you have used. And the answer to that question isn't clear. Your post doesn't name the ones you've tried, despite using language that makes clear you that you have in fact used them (e.g. "my personal experience with these tools"). Doug's question isn't just reasonable. It's exactly the question an interested, engaged reader will ask, because it's the question your entire post begs.

I can't help but point out the irony here: you write a great deal on the meticulousness and care with which you review other people's code, and criticize users of AI tools for relaxing standards, but the AI-tool user in your comments section has clearly read your lengthy post more carefully and thoughtfully than you read his generous, friendly question.

And I think it's worth pointing out that this isn't the blog post's only head scratcher. Take the opening:

> People keep asking me If I use Generative AI tools for coding and what I think of them, so this is my effort to put my thoughts in writing, so that I can send people here instead of having to repeat myself every time I get the question.

Your post never directly answers either question. Can I infer that you don't use the tools? Sure. But how hard would it be to add a "no?" And as your next paragraph makes clear, your post isn't "anti" or "pro." It's personal -- which means it also doesn't say much of anything about what you actually think of the tools themselves. This post won't help the people who are asking you whether you use the tools or what you think of them, so I don't see why you'd send them here.

> my personal experience with these tools, from a strictly technical point of view

> I hope with this article I've made the technical issues with applying GenAI coding tools to my work clear.

Again, that word: "clear." No, the post not only doesn't make clear the technical issues; it doesn't raise a single concern that I think can properly be described as technical. You even say in your reply to Doug, in essence, that your resistance isn't technical, because for you the quality of an AI assistant's output doesn't matter. Your concerns, rather, are practical, methodological, and to some extent social. These are all perfectly valid reasons for eschewing AI coding assistants. They just aren't technical -- let alone strictly technical.

I write all of this as a programmer who would rather blow his own brains out, or retire, than cede intellectual labor, the thing I love most, to a robot -- let alone line the pockets of some charlatan 'thought leader' who's promising to make a reality of upper management's dirtiest wet dream: in essence, to proletarianize skilled work and finally liberate the owners of capital from the tyranny of labor costs.

I also write all of this, I guess, as someone who thinks commenter Doug seems like a way cool guy, a decent chap who asked a reasonable question in a gracious, open way and got a weirdly dismissive, obtuse reply that belies the smug, sanctimonious hypocrisy of the blog post itself.

Oh, and one more thing: AI tools are poison. I see them as incompatible with love of programming, engineering quality, and the creation of safe, maintainable systems, and I think they should be regarded as a threat to the health and safety of everybody whose lives depend on software (all of us), not because of the dangers of machine super intelligence but because of the dangers of the complete absence of machine intelligence paired with the seductive illusion of understanding.

satisfice · 8h ago
Thank you for writing what I feel and experience, so that I don't have to.

Which is kind of like if AI wrote it: except someone is standing behind those words.

sneak · 7h ago
It’s harder to read code than it is to write it, that’s true.

But it’s also faster to read code than to write it. And it’s faster to loop a prompt back to fixed code to re-review than to write it.

AlotOfReading · 6h ago
I've written plenty of code that's much faster to write than to read. Most dense, concise code will require a lot more time building a mental model to read than it took to turn that mental model into code in the first place.
andrewstuart · 6h ago
He’s saying it’s not faster because he needs to impose his human analysis on it which is slow.

That’s fine, but it’s an arbitrary constraint he chooses, and it’s wrong to say AI is not faster. It is. He just won’t let it be faster.

Some won’t like to hear this, but no-one reviews the machine code that a compiler outputs. That’s the future, like it or not.

You can’t say compilers are slow because I add on the time I take to Analyse the machine code. That’s you being slow.

bluefirebrand · 5h ago
> no-one reviews the machine code that a compiler outputs

That's because compilers are generally pretty trustworthy. They aren't necessarily bug free, and when you do encounter compiler bugs it can be extremely nasty, but mostly they just work

If compilers were wrong as often as LLMs are, we would be reviewing machine code constantly

purerandomness · 1h ago
A compiler produces the same, deterministic output, every single time.

A stochastic parrot can never be trusted, let alone one that tweaks its model every other night.

I totally get that not all code ever written needs to be correct.

Some throw-away experiments can totally be one-shot by AI, nothing wrong with that. Depending on the industry one works in, people might be on different points of the expectation spectrum for correctness, and so their experience with LLMs vary.

It's the RAD tool discussion of the 2000s, or the "No-Code" tools debate of the last decade, all over again.

blueboo · 2h ago
Skeptics find Talking themselves out of trying them is marvellously effective for convincing themselves they’re right
strangescript · 8h ago
Everyone is still thinking about this problem the wrong way. If you are still running one agent, on one project at a time, yes, its not going to be all that helpful if you are already a fast, solid coder.

Run three, run five. Prompt with voice annotation. Run them when normally you need a cognitive break. Run them while you watch netflix on another screen. Have them do TDD. Use an orchestrator. So many more options.

I feel like another problem is deep down most developers hate debugging other people's code and thats effectively what this is at times. It doesn't matter if your Associate ran off and saved you 50k lines of typing, you would still rather do it yourself than debug the code.

I would give you grave warnings, telling you the time is nigh, adapt or die, etc, but it doesn't matter. Eventually these agents will be good enough that the results will surpass you even in simple one task at a time mode.

kibibu · 8h ago
I have never seen people work harder to dismantle their own industry than software engineers are right now.
marssaxman · 7h ago
We've been automating ourselves out of our jobs as long as we've had them; somehow, despite it all, we never run out of work to do.
kibibu · 1h ago
We've automated bullshit tedium work, like building and deploying, but this is the first time in my memory that people are actively trying to automate all the fun parts away.

Closest parallel I can think of is the code-generation-from-UML era, but that explicitly kept the design decisions on the human side, and never really took over the world.

strangescript · 8h ago
What exactly is the alternative? Wish it away? Developers have been automating away jobs for decades, its seems hypocritical to complain about it now.
hooverd · 7h ago
who gets the spoils?
hooverd · 8h ago
Sounds like a way to blast your focus into a thousand pieces
sponnath · 7h ago
Can you actually demonstrate this workflow producing good software?