Generative AI coding tools and agents do not work for me

178 nomdep 180 6/17/2025, 12:33:45 AM blog.miguelgrinberg.com ↗

Comments (180)

socalgal2 · 45m ago

> Another common argument I've heard is that Generative AI is helpful when you need to write code in a language or technology you are not familiar with. To me this also makes little sense.

I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.

Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.

socalgal2 · 33m ago

To add, another experience I had. I was using an API I'm not that familiar with. My program was crashing. Looking at the stack trace I didn't see why. Maybe if I had many months experience with this API it would be obvious but it certainly wasn't to me. For fun I just copy and pasted the stack trace into Gemini. ~60 frames worth of C++. It immediately pointed out the likely cause given the API I was using. I fixed the bug with a 2 line change once I had that clue from the AI. That seems pretty useful to me. I'm not sure how long it would have taken me to find it otherwise since, as I said, I'm not that familiar with that API.

turtlebits · 22m ago

It's perfect for small boilerplate utilities. If I need a browser extension/tampermonkey script, I can get up and running quickly without having to read docs/write manifests. These are small projects where without AI, I wouldn't have bothered to even start.

At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.

waprin · 4h ago

To some degree, traditional coding and AI coding are not the same thing, so it's not surprising that some people are better at one than the other. The author is basically saying that he's much better at coding than AI coding.

But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".

It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

mitthrowaway2 · 2h ago

The skill ceiling might be "high" but it's not like investing years of practice to become a great pianist. The most experienced AI coder in the world has about three years of practice working this way, much of which is obsoleted because the models have changed to the point where some lessons learned on GPT 3.5 don't transfer. There aren't teachers with decades of experience to learn from, either.

notnullorvoid · 2h ago

Is it a skill worth learning though? How much does the output quality improve? How transferable is it across models and tools of today, and of the future?

From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.

serpix · 1h ago

Regarding using AI tools for programming it is not a one-for-all choice. You can pick a grunt work task such as "Tag every such and such terraform resource with a uuid" and let it do just that. Nothing to do with quality but everything to do with a simple task and not having to bother with the tedium.

autobodie · 1h ago

Why use AI to do something so simple? You're only increasing the possibility that it gets done wrong. Multi-cursor editing wil be faster anyway.

barsonme · 37m ago

Why not? I regularly have a couple Claude instances running in the background chewing through simple yet time consuming tasks. It’s saved me many hours of work and given me more time to focus on the important parts.

stitched2gethr · 50m ago

It will very soon be the only way.

skydhash · 4h ago

> But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize

No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.

furyofantares · 2h ago

> No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.

I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.

I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.

Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.

viraptor · 2h ago

> It's something you can pick in a few minutes

You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?

Correct prompting these days what makes a difference in tasks like SWE-verified.

sothatsit · 2h ago

I feel like there is also a very high ceiling to how much scaffolding you can produce for the agents to get them to work better. This includes custom prompts, custom CLAUDE.md files, other documentation files for Claude to read, and especially how well and quickly your linting and tests can run, and how much functionality they cover. That's not to mention MCP and getting Claude to talk to your database or open your website using Playwright, which I have not even tried yet.

For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.

This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.

And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.

viraptor · 2h ago

> then it will produce a specification document from that

I like a similar workflow where I iterate on the spec, then convert that into a plan, then feed that step by step to the agent, forcing full feature testing after each one.

bcrosby95 · 2h ago

When you say specification, what, specifically, does that mean? Do you have an example?

I've actually been playing around with languages that separate implementation from specification under the theory that it will be better for this sort of stuff, but that leaves an extremely limited number of options (C, C++, Ada... not sure what else).

I've been using C and the various LLMs I've tried seem to have issues with the lack of memory safety there.

sothatsit · 1h ago

A "specification" as in a text document outlining all the changes to make.

For example, it might include: Overview, Database Design (Migration, Schema Updates), Backend Implementation (Model Updates, API updates), Frontend Implementation (Page Updates, Component Design), Implementation Order, Testing Considerations, Security Considerations, Performance Considerations.

It sounds like a lot when I type it out, but it is pretty quick to read through and edit.

The specification document is generated by a planning prompt that tells Claude to analyse the feature description (the couple paragraphs I wrote), research the repository context, research best practices, present a plan, gather specific requirements, perform quality control, and finally generate the planning document.

I'm not sure if this is the best process, but it seems to work pretty well.

viraptor · 1h ago

Like a spec you'd hand to a contractor. List of requirements, some business context, etc. Not a formal algorithm spec.

My basic initial prompt for that is: "we're creating a markdown specification for (...). I'll start with basic description and at each step you should refine the spec to include the new information and note what information is missing or could use refinement."

oxidant · 4h ago

I do not agree it is something you can pick up in an hour. You have to learn what AI is good at, how different models code, how to prompt to get the results you want.

If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?

I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.

AI isn't a panacea, but it can be the right tool for the job.

15123123 · 2h ago

I am also interested in how much of these skills are at the mercy of OpenAI ? Like IIRC 1 or 2 years ago there was an uproar of AI "artists" saying that their art is ruined because of model changes ( or maybe the system prompt changed ).

>I do not agree it is something you can pick up in an hour.

But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).

>You have to learn what AI is good at.

More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.

oxidant · 1h ago

Of course that's what the industry is selling because they want to make money. Yes, it's easy to create a proof of concept but once you get out of greenfield into 50-100k tokens needed in the context (reading multiple 500 line files, thinking, etc) the quality drops and you need to know how to focus the models to maintain the quality.

"Write me a server in Go" only gets you so far. What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?

I find I need to think AND write more than I would if I was doing it myself because the feedback loop is longer. Like the article says, you have to review the code instead of having implicit knowledge of what was written.

That being said, it is faster for some tasks, like writing tests (if you have good examples) and doing basic scaffolding. It needs quite a bit of hand holding which is why I believe those with more experience get more value from AI code because they have a better bullshit meter.

solumunus · 2h ago

OpenAI? They are far from the forefront here. No one is using their models for this.

15123123 · 43m ago

You can substitute for whatever saas company of your choice.

sagarpatil · 2h ago

Yeah, you can’t do sh*t in an hour. I spend a good 6-8 hours every day using Claude Code, and I actually spend an hour every day trying new AI tools, it’s a constant process.

Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.

JimDabell · 1h ago

> It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up).

This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.

__MatrixMan__ · 3h ago

It definitely takes more than minutes to discover the ways that your model is going to repeatedly piss you off and set up guardrails to mitigate those problems.

dingnuts · 4h ago

> You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.

qinsig · 4h ago

Avoid using agents that can just blow through money (cline, roocode, claudecode with API key, etc).

Instead you can get comfortable prompting and managing context with aider.

Or you can use claude code with a pro subscription for a fair amount of usage.

I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.

goalieca · 4h ago

And how often do your prompting skills change as the models evolve.

stray · 4h ago

You're going to spend a little over $1k to ramp up your skills with AI-aided coding. It's dirt cheap in the grand scheme of things.

viraptor · 2h ago

Not even close. I'm still under $100, creating full apps. Stick to reasonable models and you can achieve and learn a lot. You don't need latest and greatest in max mode (or whatever the new one calls it) for majority of the tasks. You can have to throw the whole project at the service every time either.

dingnuts · 4h ago

do I get a refund if I spend a grand and I'm still not convinced? at some point I'm going to start lying to myself to justify the cost and I don't know how much y'all earn but $1k is getting close

theoreticalmal · 2h ago

Would you ask for a refund from a university class if you didn’t get a job or skill from it? Investing in a potential skill is a risk and carries an opportunity cost, that’s part of what makes it a risk

HDThoreaun · 2h ago

No one is forcing you to improve. If you don’t want to invest in yourself that is fine, you’ll just be left behind.

asciimov · 3h ago

How are those without that kind of scratch supposed to keep up with those that do?

theoreticalmal · 2h ago

This kind of seems like asking “how are poor people supposed to keep up with rich people” which we seem to not have a long term viable answer for right now

wiseowise · 2h ago

What makes you think those without that kind of scratch are supposed to keep up?

asciimov · 1h ago

For the past 10 years we have been telling everyone learn to code, now it’s learn to build AI prompts.

Before a poor kid with a computer access could learn to code nearly for free, but if it costs $1k just to get started with AI that poor kid will never have that opportunity.

wiseowise · 46m ago

For the past 10 years scammers and profiteers been telling everyone to learn to code, not we.

sagarpatil · 2h ago

Use free tiers?

throwawaysleep · 2h ago

If you lack "that kind of scratch", you are at the learning stage for software development, not the keeping up stage. Either that or horribly underpaid.

bevr1337 · 2h ago

I recently had a coworker tell me he liked his last workplace because "we all spoke the same language." It was incredible how much he revealed about himself with what he thought was a simple fact about engineer culture. Your comment reminds me of that exchange.

- Employers, not employees, should provide workplace equipment or compensation for equipment. Don't buy bits for the shop, nails for the foreman, or Cursor for the tech lead.

- the workplace is not a meritocracy. People are not defined by their wealth.

- If $1,000 does not represent an appreciable amount of someone's assets, they are doing well in life. Approximately half of US citizens cannot afford rent if they lose a paycheck.

- Sometimes the money needs to go somewhere else. Got kids? Sick and in the hospital? Loan sharks? A pool full of sharks and they need a lot of food?

- Folks can have different priorities and it's as simple as that

We're (my employer) still unsure if new dev tooling is improving productivity. If we find out it was unhelpful, I'll be very glad I didn't lose my own money.

15123123 · 2h ago

$100 per month for a SaaS is quite a lot outside of Western countries. People are not even spending that much on VPN or Password Manager.

No comments yet

badsectoracula · 2h ago

It wont be the hippest of solutions, but you can use something like Devstral Small with a full open source setup to get experimenting with local LLMs and a bunch of tools - or just chat with it with a chat interface. I did pingponged between Devstral running as a chat interface and my regular text editor some time ago to make a toy project of a raytracer [0] (output) [1] (code).

While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).

It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.

IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).

[0] https://i.imgur.com/FevOm0o.png

[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...

[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things

[3] the code it produced for [2] was fixable to do a simple BVH

[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.

grogenaut · 4h ago

how much time did you spend learning your last language to become comfortable with it?

lexandstuff · 3h ago

Great article. The other thing that you miss out on when you don't write the code yourself is that sense of your subconscious working for you. Writing code has a side benefit of developing a really strong mental model of a problem, that kinda gets embedded in your neurons and pays dividends down the track, when doing stuff like troubleshooting or deciding on how to integrate a new feature. You even find yourself solving problems in your sleep.

I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

nerevarthelame · 2h ago

> I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.

But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.

[1] https://storage.googleapis.com/gweb-research2023-media/pubto...

wiseowise · 1h ago

You’re twisting results. Just because they took more time doesn’t mean their productivity went down. On the contrary, if you can perform expert task with much less mental resources (which 99% of orgs should prioritize for) then it is an absolute win. Work is extremely mentally draining and soul crushing experience for majority of people, if AI can lower that while maintaining roughly same result with subjects allocating only, say, 25% of their mental energy – that’s an amazing win.

didibus · 1h ago

If I follow what you are saying, employers won't see any benefits, but employees, while they will take the same time and create the same output in the same amount of time, will be able to do so at a reduced mental strain?

Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.

marssaxman · 2h ago

So far as I can tell, generative AI coding tools make the easy part of the job go faster, without helping with the hard part of the job - in fact, possibly making it harder. Coding just doesn't take that much time, and I don't need help doing it. You could make my coding output 100x faster without materially changing my overall productivity, so I simply don't bother to optimize there.

Jonovono · 2h ago

Are you a plumber perhaps?

worik · 2h ago

I am.

That is the mental model I have for the work (computer programing) i like to do and am good at.

Plumbing

tptacek · 2h ago

I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.

The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.

But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.

Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.

That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.

If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".

kenjackson · 2h ago

I can read code much faster than I can write it.

This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.

autobodie · 1h ago

I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.

I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.

kenjackson · 54m ago

I only generate the code once with GenAI and typically fix a bug or two - or at worst use its structure. Rarely do I toss a full PR.

It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

smaudet · 2h ago

I guess my challenge is that "if it was a rote recitation of an idiomatic go function", was it worth writing?

There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.

The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.

And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...

Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.

tptacek · 1h ago

Yes. More things should be rote recitations. Rote code is easy to follow and maintain. We get in trouble trying to be clever (or DRY) --- especially when we do it too early.

Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.

smaudet · 1h ago

> We get in trouble trying to be clever (or DRY)

Certainly, however:

> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work

The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".

My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.

stitched2gethr · 47m ago

Why would you review agent generated code any differently than human generated code?

tptacek · 16m ago

Because you don't care about the effort the agent took and can just ask for a do-over.

112233 · 1h ago

This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.

I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.

tptacek · 1h ago

I don't find this to be the case. I've used (and hate) autocomplete-style LLM code generation. But I can feed 10 different tasks to Codex in the morning and come back and pick out the 3-4 I think might be worth pursuing, and just re-prompt the 7 I kill. That's nothing like interactive autocomplete, and drastically faster than than I could work without LLM assistance.

bluefirebrand · 1h ago

> Obviously right — accept.

I don't think code is ever "obviously right" unless it is trivially simple

monero-xmr · 2h ago

I mostly just approve PRs because I trust my engineers. I have developed a 6th sense for thousand-line PRs and knowing which 100-300 lines need careful study.

Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.

"Ship it!" - me

theK · 1h ago

I think this points out the crux of the difference of collaborating with other devs vs collaborating with am AI. The article correctly States that the AI will never learn your preferences or idiosyncrasies of the specific projects/company etc because it effectively is amnesic. You cannot trust the AI the same you trust other known collaborators because you don't have a real relationship with it.

autobodie · 1h ago

Haha, doing this with AI will bury you in a very deep hole.

jumploops · 5h ago

> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

As someone who uses Claude Code heavily, this is spot on.

LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.

I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).

adriand · 4h ago

Do you have to review the code? I’ll be honest that, like the OP theorizes, I often just spot review it. But I also get it to write specs (often very good, in terms of the ones I’ve dug into), and I always carefully review and test the results. Because there is also plenty of non-AI code in my projects I didn’t review at all, namely, the myriad open source libraries I’ve installed.

jumploops · 4h ago

Yes, I’m actually working on an another project with the goal of never looking at the code.

For context, it’s just a reimplementation of a tool I built.

Let’s just say it’s going a lot slower than the first time I built it by hand :)

hatefulmoron · 3h ago

It depends on what you're doing. If it's a simple task, or you're making something that won't grow into something larger, eyeballing the code and testing it is usually perfect. These types of tasks feel great with Claude Code.

If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.

The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.

sagarpatil · 2h ago

I always use Claude Code to debug issues, there’s no point in trying to do this yourself when AI can fix it in minutes (easy to verify if you write tests first) o3 with new search can do things in 5 mins that will take me at least 30 mins if I’m very efficient. Say what you want but the time savings is real.

susshshshah · 2h ago

How do you know what tests to write if you don’t understand the code?

adastra22 · 1h ago

I’m not sure I understand this statement. You give your program parameters X and expect result Y, but instead get Z. There is your test, embedded in the problem statement.

9rx · 2h ago

Same way you normally would? Tests are concerned with behaviour. The code that implements the behaviour is immaterial.

wiseowise · 1h ago

How do you do TDD without having code in the first place? How do QA verifies without reading the source?

cbsmith · 4h ago

There's an implied assumption here that code you write yourself doesn't need to be reviewed from a context different from the author's.

There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".

Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).

It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.

Terr_ · 2h ago

There's an implied assumption here that developers who end up spending all their time reviewing LLM code won't lose their skills or become homicidal. :p

cbsmith · 2h ago

Fair enough. ;-)

I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.

Terr_ · 1h ago

I expect that comes from the contrast and synthesis between how the author is anticipating things will develop or be explained, versus what the other person actually provided and trying to understand their thought process.

What happens if the reader no longer has enough of that authorial instinct, their own (opinionated) independent understanding?

I think the average experience would drift away from "I thought X was the obvious way but now I see by doing Y you were avoid that other problem, cool" and towards "I don't see the LLM doing anything too unusual compared to when I ask it for things, LGTM."

cbsmith · 42m ago

It seems counter intuitive that the reader would no longer have that authorial instinct due to lack of writing. Like, maybe they never had it, in which case, yes. But being exposed to a lot of different "writing opinions" tends to hone your own.

Let's say you're right though, and you lose that authorial instinct. If you've got five different proposals/PRs from five different models, each one critiqued by the other four, the needs for authorial instinct diminish significantly.

mleonhard · 4h ago

I solved my RSI symptoms by keeping my arms warm all the time, while awake or asleep. Maybe that will work for you, too?

jumploops · 4h ago

My issue is actually due to ulnar nerve compression related to a plate on my right clavicle.

Years of PT have enabled me to work quite effectively and minimize the flare ups :)

hooverd · 4h ago

Is anybody doing cool hybrid interfaces? I don't actually want to do everything in conversational English, believe it or not.

jumploops · 4h ago

My workflow is to have spec files (markdown) for any changes I’m making, and then use those to keep Claude on track/pull out of the trees.

Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.

I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.

adastra22 · 1h ago

I do similar and find that this is the best compromise that I have tried. But I still find myself nodding along with OP. I am more and more finding that this is not actually faster, even though it certainly seems so.

bdamm · 4h ago

Isn't that what Windsurf or Cursor are?

roxolotl · 4h ago

> But interns learn and get better over time. The time that you spend reviewing code or providing feedback to an intern is not wasted, it is an investment in the future. The intern absorbs the knowledge you share and uses it for new tasks you assign to them later on.

This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.

viraptor · 2h ago

You don't have to reexplain the business context. Save it to the mdc file if it's important. The added benefit is that the next real person looking at the code can also use that to learn - it's actually cool for having good up to date documentation is now an asset.

adastra22 · 1h ago

Do you find your agent actually respecting the mdc file? I don’t.

viraptor · 32m ago

There should be no difference between the mdc and the text in the prompt. Try something drastic like "All of responses should be in Chinese". If it doesn't happen, they're not included correctly. Otherwise, yeah, they work modulo the usual issues of prompt adherence.

xarope · 3h ago

I think this is how certain LLMs end up with 14k worth of system prompts

Terr_ · 2h ago

"Be fast", "Be Cheap", "Be Good".

*dusts off hands* Problem solved! Man, am I great at management or what?

freeone3000 · 4h ago

Put the business context in the system prompt.

dvt · 3h ago

I'm actually quite bearish on AI in the generative space, but even I have to admit that writing boilerplate is "N" times faster using AI (use your favorite N). I hate when people claim this without any proof, so literally today this is what I asked ChatGPT:

    write a stub for a react context based on this section (which will function as a modal):
    ```
        <section>
         // a bunch of stuff
        </section>
    ```

Worked great, it created a few files (the hook, the provider component, etc.), and I then added them to my project. I've done this a zillion times, but I don't want to do it again, it's not interesting to me, and I'd have to look up stuff if I messed it up from memory (which I likely would, because provider/context boilerplate sucks).

Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)

Winsaucerer · 2h ago

This kind of thing is my main use, boilerplate stuff And for scripts that I don't care about -- e.g., if I need a quick bash script to do a once off task.

For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.

didibus · 1h ago

You could argue that AI-generated code is a black box, but let's adjust our perspective here. When was the last time you thoroughly reviewed the source code of a library you imported? We already work with black boxes daily as we evaluate libraries by their interfaces and behaviors, not by reading every line.

The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.

The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.

I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?

materielle · 1h ago

I’m not sure that the library comparison really works.

Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.

Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.

Who is accountable for the code that AI writes?

bluefirebrand · 1h ago

> When was the last time you thoroughly reviewed the source code of a library you imported?

Doesn't matter, I'm not responsible for maintaining that particular code

The code in my PRs has my name attached, and I'm not trusting any LLM with my name

didibus · 50m ago

Exactly, that's what I'm saying. Commit AI code under its own name. Then the code under your name can use the AI code as a black box. If your code that uses AI code works as expected, it is similar to when using libraries.

If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.

adastra22 · 59m ago

These days, I review external dependencies pretty thoroughly. I did not use to. This is because of AI slop though.

danieltanfh95 · 3h ago

AI models are fundamentally trained on patterns from existing data - they learn to recognize and reproduce successful solution templates rather than derive solutions from foundational principles. When faced with a problem, the model searches for the closest match in its training experience rather than building up from basic assumptions and logical steps.

Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.

Even when explicitly prompted to use first-principles analysis, AI models can struggle because:

- They lack the intuitive understanding of when to discard prior assumptions

- They don't naturally distinguish between surface-level similarity and deep structural similarity

- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics

This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.

Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.

adastra22 · 57m ago

So are people. People are trained on existing data and learn to reproduce known solutions. They also take this to the meta level—a scientist or engineer is trained on methods for approaching new problems which have yielded success in the past. AI does this too. I’m not sure there is actually a distinction here..

animex · 1h ago

I write mostly boilerplate and I'd rather have the AI do it. The AI is also slow, which is great, which allows me to run 2 or 3 AI workspaces working on different tickets/problems at the same time.

Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.

I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.

ed_mercer · 4h ago

> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

Hard disagree. It's still way faster to review code than to manually write it. Also the speed at which agents can find files and the right places to add/edit stuff alone is a game changer.

Winsaucerer · 2h ago

There's a difference between reviewing code by developers you trust, and reviewing code by developers you don't trust or AI you don't trust.

Although tbh, even in the worse case I think I am still faster at reviewing than writing. The only difference is though, those reviews will never have had the same depth of thought and consideration as when I write the code myself. So reviews are quicker, but also less thorough/robust than writing for me.

bluefirebrand · 1h ago

> also less thorough/robust than writing for me.

This strikes me as a tradeoff I'm absolutely not willing to make, not when my name is on the PR

__loam · 4h ago

You are probably not being thorough enough.

frankc · 4h ago

I just don't agree with this. I am generally telling the model how to do the work according to an architecture I specify using technology I understand. The hardest part for me in reviewing someone else's code is understanding their overall solution and how everything fits together as it's not likely to be exactly the way I would have structured the code or solved the problem. However, with an LLM it generally isn't since we have pre-agreed upon a solution path. If that is not what is happening than likely you are letting the model get too far ahead.

There are other times when I am building a stand-alone tool and am fine wiht whatever it wants to do because it's not something I plan to maintain and its functional correctness is self-evident. In that case I don't even review what it's doing unless it's stuck. This is more actual vibe code. This isn't something I would do for something I am integrating into a larger system but will for something like a cli tool that I use to enhance my workflow.

ken47 · 4h ago

You can pre-agree on a solution path with human engineers too, with a similar effect.

bigbuppo · 2h ago

Don't try to argue with those using AI coding tools. They don't interact well with actual humans, which is why they've been relegated to talking to the computer. We'll eventually have them all working on some busy projects to help with "marketing" to keep them distracted while the decent programmers that can actually work in a team environment can get back to useful work free of the terrible programmers and marketing departments.

wiseowise · 1h ago

> that can actually work in a team environment can get back to useful work free of the terrible programmers

Is that what you and your buddies talk about at two hour long coffee/smoke breaks while “terrible” programmers work?

zmmmmm · 4h ago

I think there's a key context difference here in play which is that AI tools aren't better than an expert on the language and code base that is being written. But the problem is that most software isn't written by such experts. It's written by people with very hazy knowledge of the domain and only partial knowledge of the languages and frameworks they are using. Getting it to be stylistically consistent or 100% optimal is far from the main problem. In these contexts AI is a huge help, I find.

joelthelion · 27m ago

I think it's getting clear that, in the current stage, Ai coding agent are mostly useful for people working either on small projects, or isolated new features. People who maintain a large framework find it less useful.

edg5000 · 1h ago

It's a bit like going from assembly to C++, except we don't have good rigid rules for high-level program specification. If we had a rigid "high-level language" to express programs, orders or magnitude more high-level than C++ and other, than we could maybe evaluate it for correctness and get 100% output reliability, perhaps. All the languages I picked up, I picked them up when they were at least 10 years old. I'm trying to use AI a bit these days for programming, but it feels like what it must have felt like using C++ when it just came available; promising but not usable (yet?) for most programming situations.

kachapopopow · 2h ago

AI is a tool like any other, you have to learn to use it.

I had AI create me a k8s device plugin for supporting sr-iov only vGPU's. Something nvidia calls "vendor specific" and basically offers little to not support for in their public repositories for Linux KVM.

I loaded up a new go project in goland, opened up Junie, typed what I needed and what I have, went to make tea, came back, looked over the code to make sure it wasn't going to destroy my cluster (thankfully most operations were read-only), deployed it with the generated helm chart and it worked (nearly) first try.

Before this I really had no idea how to create device plugins other than knowing what they are and even if I did, it would have easily taken me an hour or more to have something working.

The only thing AI got wrong is that the virtual functions were symlinks and not directories.

The entire project is good enough that I would consider opensourcing it. With 2 more prompts I had configmap parsing to initialize virtual functions on-demand.

No comments yet

euleriancon · 3h ago

> The truth that may be shocking to some is that open source contributions submitted by users do not really save me time either, because I also feel I have to do a rigorous review of them.

This truly is shocking. If you are reviewing every single line of every package you intend to use how do you ever write any code?

adastra22 · 56m ago

That’s not what he said. He said he reviews every line of every pull request he receives to his own projects. Wouldn’t you?

abenga · 2h ago

You do not need to review every line of every package you use, just the subset of the interface you import/link and use. You have to review every line of code you commit into your project. I think attempting to equate the two is dishonest dissembling.

euleriancon · 2h ago

To me, the point the friend is making is, just like you said, that you don't need to review every line of code in a package, just the interface. The author misses the point that there truly is code that you trust without seeing it. At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state. As AI code becomes more reliable, it will likely become the case that you only need to read the subset of the interface you import/link and use.

bluefirebrand · 1h ago

This absolutely is intrinsic to the workflow

Using a package that hundreds of thousands of other people use is low risk, it is battle tested

It doesn't matter how good AI code gets, a unique solution that no one else has ever touched is always going to be more brittle and risky than an open source package with tons of deployments

And yes, if you are using an Open Source package that has low usage, you should be reviewing it very carefully before you embrace it

Treat AI code as if you were importing from a git repo with 5 installs, not a huge package with Mozilla funding

root_axis · 1h ago

> At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state

This remains to be seen. It's still early days, but self-attention scales quadratically. This is a major red flag for the future potential of these systems.

edg5000 · 1h ago

A huge bottleneck seems the lack of memory between sessions, at least with Claude Code. Sure, I can write things into a text file, but it's not the same as having an AI actually remember the work done earlier.

Is this possible in any way today? Does one need to use Llama or DeepSeek, and do we have to run it on our own hardware to get persistence?

fshafique · 4h ago

"do not work for me", I believe, is the key message here. I think a lot of AI companies have crafted their tools such that adoption has increased as the tools and the output got better. But there will always be a few stragglers, non-normative types, or situations where the AI agent is just not suitable.

lexandstuff · 3h ago

Maybe, but there's also some evidence that AI coding tools aren't making anyone more productive. One study from last year found that there was no increase in developer velocity but a dramatic increase in bugs.[1] Granted, the technology has advanced since this study, but many of the fundamental issues of LLM unreliability remain. Additionally, a recent study has highlighted the significant cognitive costs associated with offloading problem-solving onto LLMs, revealing that individuals who do so develop significantly weaker neural connectivity than those who don't [2].

It's very possible that AI is literally making us less productive and dumber. Yet they are being pushed by subscription-peddling companies as if it is impossible to operate without them. I'm glad some people are calling it out.

[1] https://devops.com/study-finds-no-devops-productivity-gains-...

[2] https://arxiv.org/abs/2506.08872

fshafique · 1h ago

One year ago I probably would've said the same. But I started dabbling with it recently, and I'm awed by it.

afarviral · 2h ago

This has been my experience as well, but there are plenty of assertions here that are not always true, e.g. "AI coding tools are sophisticated enough (they are not) to fix issues in my projects" … but how do you know this if you are not constantly checking whether the tooling has improved? I think for a certain level of issue AI can tackle it and improve things, but there's only a subset of the available models and of a multitude of workflows that will work well, but unfortunately we are drowning in many that are mediocre at best and many like me give up before finding the winning combination.

royal__ · 4h ago

I get confused when I see stances like this, because it gives me the sense that maybe people just aren't using coding tools efficiently.

90% of my usage of Copilot is just fancy autocomplete: I know exactly what I want, and as I'm typing out the line of code it finishes it off for me. Or, I have a rough idea of the syntax I need to use a specific package that I use once every few months, and it helps remind me what the syntax is, because once I see it I know it's right. This usage isn't really glamorous, but it does save me tiny bits of time in terms of literal typing, or a simple search I might need to do. Articles like this make me wonder if people who don't like coding tools are trying to copy and paste huge blocks of code; of course it's slower.

kibibu · 4h ago

My experience is that the "fancy autocomplete" is a focus destroyer.

I know what function I want to write, start writing it, and then bam! The screen fills with ghost text that may partly be what I want but probably not quit.

Focus shifts from writing to code review. I wrest my attention back to the task at hand, type some more, and bam! New ghost text to distract me.

Ever had the misfortune of having a conversation with a sentence-finisher? Feels like that.

Perhaps I need to bind to a hot key instead of using the default always-on setting.

---

I suspect people using the agentic approaches skip this entirely and therefore have a more pleasant experience overall.

atq2119 · 2h ago

It's fascinating how differently people's brains work.

Autocomplete is a total focus destroyer for me when it comes to text, e.g. when writing a design document. When I'm editing code, it sometimes trips me up (hitting tab to indent but end up accepting a suggestion instead), but without destroying my focus.

I believe your reported experience, but mine (and presumably many others') is different.

skydhash · 4h ago

That usage is the most disruptive for me. With normal intellisense and a library you're familiar with, you can predict the completion and just type normally with minimal interruption. With no completion, I can just touch type and fix the errors after the short burst. But having whole lines pop up break that flow state.

With unfamiliar syntax, I only needs a few minutes and a cheatsheet to get back in the groove. Then typing go back to that flow state.

Typing code is always semi-unconscious. Just like you don't pay that much attention to every character when you're writing notes on paper.

Editing code is where I focus on it, but I'm also reading docs, running tests,...

karl11 · 3h ago

There is an important concept alluded to here around skin in the game: "the AI is not going to assume any liability if this code ever malfunctions" -- it is one of the issues I see w/ self-driving cars, planes, etc. If it malfunctions, there is no consequence for the 'AI' (no skin in the game) but there are definitely consequences for any humans involved.

handfuloflight · 4h ago

Will we be having these conversations for the next decade?

wiseowise · 1h ago

It’s the new “I use Vim/Emacs/Ed over IDE”.

ken47 · 4h ago

Longer.

adventured · 4h ago

The conversations will climb the ladder and narrow.

Eventually: well, but, the AI coding agent isn't better than a top 10%/5%/1% software developer.

And it'll be that the coding agents can't do narrow X thing better than a top tier specialist at that thing.

The skeptics will forever move the goal posts.

jdbernard · 4h ago

If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.

However, assuming we are still having this conversation, that alone is proof to me that the AI is not that capable. We're several years into "replace all devs in six months." We will have to continue wait and see it try and do.

wiseowise · 58m ago

> If the AI actually outperforms humans in the full context of the work, then no, we won't.

IDEs outperform any “dumb” editor in full context of work. You don’t see any less posts about “I use Vim, btw” (and I say this as Vim user).

b0a04gl · 3h ago

clarity is exactly why ai tools could work well for anyone. they're not confused users , they know what they want and that makes them ideal operators of these systems. if anything, the friction they're seeing isn't proof the tools are broken, it's proof the interface is still too blunt. you can't hand off intent without structure. but when someone like uses ai with clean prompts, tight scope, and review discipline, the tools usually align. it's not either-or. the tools aren't failing them, they're underutilissed.

block_dagger · 4h ago

> For every new task this "AI intern" resets back to square one without having learned a thing!

I guess the author is not aware of Cursor rules, AGENTS.md, CLAUDE.md, etc. Task-list oriented rules specifically help with long term context.

adastra22 · 53m ago

Do they? I have found that with Cursor at least, the model very quickly starts ignoring rules.

stray · 4h ago

You can lead a horse to the documentation, but you can't make him think.

wiseowise · 57m ago

Think is means to an end, not the end goal.

Or are you talking about OP not knowing AI tools enough?

sagarpatil · 3h ago

What really baffles me is the claims from: Anthropic: 80% of the code is generated by AI OpenAI: 70-80% Google/Microsoft: 30%

root_axis · 1h ago

The use of various AI coding tools is so diffuse that there isn't even a practical way to measure this. You can be assured those numbers are more or less napkin math based on some arbitrary AI performance factor applied to the total code writing population of the company.

nojs · 2h ago

This does not contradict the article - it may be true, and yet not significantly more productive, because of the increased review burden.

dpcan · 3h ago

This article is just simply not true for most people who have figured out how to use AI properly when coding. Since switching to Cursor, my coding speed and efficiency has probably increased 10x conservatively. When I'm using it to code in languages I've used for 25+ years, it's a breeze to look over the function it just saved me time by pre-thinking and typing it out for me. Could I have done it myself, yeah, but it would have taken longer if I even had to go lookup one tiny thing in the documentation, like order of parameters for a function, or that little syntax thing I never use...

Also, the auto-complete with tools like Cursor are mind blowing. When I can press tab to have it finish the next 4 lines of a prepared statement, or it just knows the next 5 variables I need to define because I just set up a function that will use them.... that's a huge time saver when you add it all up.

My policy is simple, don't put anything AI creates into production if you don't understand what it's doing. Essentially, I use it for speed and efficiency, not to fill in where I don't know at all what I'm doing.

amlib · 2h ago

What do you even mean with a 10x increase in efficiency? Does that means you commit 10x more code every day? Or that "you" essentially "type" code 10x faster? In the later case all the other tasks surrounding code would still take around the same netting you much less than 10x increase in overall productivity, probably less than 2x?

asciimov · 3h ago

Out of curiosity how much are you spending on AI?

How much do you believe a programmer needs to layout to “get good”?

epiccoleman · 2h ago

I am currently subscribed to Claude Pro, which is $20/mo and gives you plenty to experiment with by giving you access to Projects and MCP in Claude Desktop and also Claude Code for a flat monthly fee. (I think there are usage limits but I haven't hit them).

I've probably fed $100 in API tokens into the OpenAI and Anthropic consoles over the last two years or so.

I was subscribed to Cursor for a while too, though I'm kinda souring on it and looking at other options.

At one point I had a ChatGPT pro sub, I have found Claude more valuable lately. Same goes for Gemini, I think it's pretty good but I haven't felt compelled to pay for it.

I guess my overall point is you don't have to break the bank to try this stuff out. Shell out the $20 for a month, cancel immediately, and if you miss it when it expires, resub. $20 is frankly a very low bar to clear - if it's making me even 1% more productive, $20 is an easy win.

strangescript · 4h ago

Everyone is still thinking about this problem the wrong way. If you are still running one agent, on one project at a time, yes, its not going to be all that helpful if you are already a fast, solid coder.

Run three, run five. Prompt with voice annotation. Run them when normally you need a cognitive break. Run them while you watch netflix on another screen. Have them do TDD. Use an orchestrator. So many more options.

I feel like another problem is deep down most developers hate debugging other people's code and thats effectively what this is at times. It doesn't matter if your Associate ran off and saved you 50k lines of typing, you would still rather do it yourself than debug the code.

I would give you grave warnings, telling you the time is nigh, adapt or die, etc, but it doesn't matter. Eventually these agents will be good enough that the results will surpass you even in simple one task at a time mode.

kibibu · 4h ago

I have never seen people work harder to dismantle their own industry than software engineers are right now.

marssaxman · 3h ago

We've been automating ourselves out of our jobs as long as we've had them; somehow, despite it all, we never run out of work to do.

strangescript · 4h ago

What exactly is the alternative? Wish it away? Developers have been automating away jobs for decades, its seems hypocritical to complain about it now.

hooverd · 3h ago

who gets the spoils?

sponnath · 3h ago

Can you actually demonstrate this workflow producing good software?

hooverd · 4h ago

Sounds like a way to blast your focus into a thousand pieces

p1dda · 3h ago

It would be interesting to see which is faster/better in competitive coding, the human coder or the human using AI to assist in coding.

Snuggly73 · 55m ago

New benchmark for competitive coding dropped yesterday - https://livecodebenchpro.com/

Apparently models are not doing great for problems out of distribution.

wiseowise · 55m ago

It already happened. Last year AI submissions completely destroyed AoC, as far as I remember.

asciimov · 3h ago

It would only be interesting if the problem was truly novel. If the AI has already been trained on the problem it’ll just push out a solution.

skydhash · 4h ago

I do agree with these points in my situation. I don't actually care for speed or having generated snippets for unfamiliar domains. Coding for me has always be about learning. Whether I'm building out a new feature or solving a bug, programming is always a learning experience. The goal is to bring forth a solution that a computer can then perform, but in the process you learn about how and more importantly why you should solve a problem.

The concept of why can get nebulous in a corporate setting, but it's nevertheless fun to explore. At the end of the day, someone have a problem and you're the one getting the computer to solve it. The process of getting there is fun in a way that you learn about what irks someone else (or yourself).

Thinking about the problem and its solution can be augmented with computers (I'm not remembering Go Standard Library). But computers are simple machines with very complex abstractions built on top of them. The thrill is in thinking in terms of two worlds, the real one where the problem occurs and the computing one where the solution will come forth. The analogy may be more understandable to someone who've learned two or more languages and think about the nuances between using them to depict the same reality.

Same as the TFA, I'm spending most of my time manipulating a mental model of the solution. When I get to code is just a translation. But the mental model is difuse, so getting it written gives it a firmer existence. LLMs generation is mostly disrupting the process. The only way they help really is a more pliable form of Stack Overflow, but I've only used Stack Overflow as human-authored annotations of the official docs.

bilalq · 2h ago

I've found "agents" to be an utter disappointment in their current state. You can never trust what they've done and need to spend so much time verifying their solution that you may as well have just done it yourself in the first place.

However, AI code reviewers have been really impressive. We run three separate AI reviewers right now and are considering adding more. One of these reviewers is kind of noisy, so we may drop it, but the others have been great. Sure, they have false positives sometimes and they don't catch everything. But they do catch real issues and prevent customer impact.

The Copilot style inline suggestions are also decent. You can't rely on it for things you don't know about, but it's great at predicting what you were going to type anyway.

nreece · 2h ago

Heard someone say the other day "AI coding is just advanced scaffolding right now." Made me wonder if we're expecting too much out of it, at-least for now.

andrewstuart · 2h ago

He’s saying it’s not faster because he needs to impose his human analysis on it which is slow.

That’s fine, but it’s an arbitrary constraint he chooses, and it’s wrong to say AI is not faster. It is. He just won’t let it be faster.

Some won’t like to hear this, but no-one reviews the machine code that a compiler outputs. That’s the future, like it or not.

You can’t say compilers are slow because I add on the time I take to Analyse the machine code. That’s you being slow.

bluefirebrand · 1h ago

> no-one reviews the machine code that a compiler outputs

That's because compilers are generally pretty trustworthy. They aren't necessarily bug free, and when you do encounter compiler bugs it can be extremely nasty, but mostly they just work

If compilers were wrong as often as LLMs are, we would be reviewing machine code constantly

nurettin · 1h ago

I simply don't like the code it writes. Whenever I try using llms, it is like wrestling for conciseness. Terrible code which is almost certainly 1/10 error or "extras" I don't need. At this point I am simply using it to motivate me to move forward.

Writing a bunch of orm code feels boring? I make it generate the code and edit. Importing data? I just make it generate inserts. New models are good at reformatting data.

Using a third party Library? I force it to look up every function doc online and it still has errors.

Adding transforms and pivots to sql while keeping to my style? It is a mess. Forget it. I do that by hand.

hooverd · 4h ago

The moat is that juniors, never having worked without these tools, provide revenue to AI middlemen. Ideally they're blasting their focus to hell on short form video and stimulants, and are mentally unable to do the job without them.

Terr_ · 2h ago

Given some the creeping appeal of LLMs as cheating tools in education, some of them may be arriving in the labor market with their brains already cooked.

bdamm · 4h ago

No offense intended, but this is written by a guy who has the spare time to write the blog. I can only assume his problem space is pretty narrow. I'm not sure what his workflow is like, but personally I am interacting with so many different tools, in so many different environments, with so many unique problem sets, that being able to use AIs for error evaluation, and yes, for writing code, has indeed been a game changer. In my experience it doesn't replace people at all, but they sure are powerful tools. Can they write unsupervised code? No. Do you need to read the code they write? Yes, absolutely. Can the AIs produce bugs that take time to find? Yes.

But despite all that, the tools can find problems, get information, and propose solutions so much faster and across such a vast set of challenges that I simply cannot imagine going back to working without them.

This fellow should keep on working without AIs. All the more power to him. And he can ride that horse all the way into retirement, most likely. But it's like ignoring the rise of IDEs, or Google search, or AWS.

ken47 · 4h ago

> rise of IDEs, or Google search, or AWS.

None of these things introduced the risk of directly breaking your codebase without very close oversight. If LLMs can surpass that hurdle, then we’ll all be having a different conversation.

wiseowise · 50m ago

Literally everything in this list, except AWS, introduces risk of breaking your code base without close oversight. Same people who copy paste LLM code into IDEs are yesterday’s copy paste from SO and random Google searches.

stray · 3h ago

A human deftly wielding an LLM can surpass that hurdle. I laugh at the idea of telling Claude Code to do the needful and then blindly pushing to prod.

bdamm · 4h ago

This is not the right way to look at it. You don't have to have the LLMs directly coding your work unsupervised to see the enormous power that is there.

And besides, not all LLMs are the same when it comes to breaking existing functions. I've noticed that Claude 3.7 is far better at not breaking things that already work than whatever it is that comes with Cursor by default, for example.

satisfice · 4h ago

You think he's not using the tools correctly. I think you aren't doing your job responsibly. You must think he isn't trying very hard. I think you are not trying very hard...

That is the two sides of the argument. It could only be settled, in principle, if both sides were directly observing each other's work in real-time.

But, I've tried that, too. 20 years ago in a debate between dedicated testers and a group of Agilists who believed all testing should be automated. We worked together for a week on a project, and the last day broke down in chaos. Each side interpreted the events and evidence differently. To this day the same debate continues.

worik · 2h ago

There are tasks I find AI (I use DeepSeek) useful for

I have not found it useful for large programming tasks. But for small tasks, a sort of personalised boiler plate, I find it useful

globnomulous · 47m ago

Decided to post my comment here rather than on the author's blog. Dang and tonhow, if the tone is too personal or polemical, I apologize. I don't think I'm breaking any HN rules.

Commenter Doug asks:

> > what AI coding tools have you utilized

Miguel replies:

> I don't use any AI coding tools. Isn't that pretty clear after reading this blog post?

Doug didn't ask what tools you use, Miguel. He asked which tools you have used. And the answer to that question isn't clear. Your post doesn't name the ones you've tried, despite using language that makes clear you that you have in fact used them (e.g. "my personal experience with these tools"). Doug's question isn't just reasonable. It's exactly the question an interested, engaged reader will ask, because it's the question your entire post begs.

I can't help but point out the irony here: you write a great real on the meticulousness and care with which you review other people's code, and criticize users of AI tools for relaxing standards, but the AI-tool user in your comments section has clearly read your lengthy post more carefully and thoughtfully than you read his generous, friendly question.

And I think it's worth pointing out that this isn't the blog post's only head scratcher. Take the opening:

> People keep asking me If I use Generative AI tools for coding and what I think of them, so this is my effort to put my thoughts in writing, so that I can send people here instead of having to repeat myself every time I get the question.

Your post never directly answers either question. Can I infer that you don't use the tools? Sure. But how hard would it be to add a "no?" And as your next paragraph makes clear, your post isn't "anti" or "pro." It's personal -- which means it also doesn't say much of anything about what you actually think of the tools themselves. This post won't help the people who are asking you whether you use the tools or what you think of them, so I don't see why you'd send them here.

> my personal experience with these tools, from a strictly technical point of view

> I hope with this article I've made the technical issues with applying GenAI coding tools to my work clear.

Again, that word: "clear." No, the post not only doesn't make clear the technical issues; it doesn't raise a single concern that I think can properly be described as technical. You even say in your reply to Doug, in essence, that your resistance isn't technical, because for you the quality of an AI assistant's output doesn't matter. Your concerns, rather, are practical, methodological, and to some extent social. These are all perfectly valid reasons for eschewing AI coding assistants. They just aren't technical -- let alone strictly technical.

I write all of this as a programmer who would rather blow his own brains out, or retire, than cede intellectual labor, the thing I love most, to a robot -- let alone line the pockets of some charlatan 'thought leader' who's promising to make a reality of upper management's dirtiest wet dream: in essence, to proletarianize skilled work and finally liberate the owners of capital from the tyranny of labor costs.

I also write all of this, I guess, as someone who thinks commenter Doug just seems like a way cool guy, a decent chap who just asked a reasonable question in a gracious, open way and got a weirdly dismissive, obtuse reply that belies the smug, sanctimonious hypocrisy of the blog post itself.

Oh, and one more thing: AI tools are poison. I see them as incompatible with love of programming, engineering quality, and the creation of safe, maintainable systems, and I think they should be regarded as a threat to the health and safety of anybody whose lives depend on software (all of us), not because of the dangers of machine super intelligence but because of the dangers of the complete absence of machine intelligence when it's paired with the sedictive illusion of understanding.

sneak · 3h ago

It’s harder to read code than it is to write it, that’s true.

But it’s also faster to read code than to write it. And it’s faster to loop a prompt back to fixed code to re-review than to write it.

AlotOfReading · 2h ago

I've written plenty of code that's much faster to write than to read. Most dense, concise code will require a lot more time building a mental model to read than it took to turn that mental model into code in the first place.

satisfice · 4h ago

Thank you for writing what I feel and experience, so that I don't have to.

Which is kind of like if AI wrote it: except someone is standing behind those words.

Show HN: I recreated 90s Mode X demoscene effects in JavaScript and Canvas (jdfio.com)

What happens when clergy take psilocybin (nautil.us)

How Frogger 2’s source code was recovered from a destroyed tape [video] (youtube.com)

Show HN: Chawan TUI web browser (chawan.net)

Selfish reasons for building accessible UIs (nolanlawson.com)

The Humble Programmer (1972) (cs.utexas.edu)

Defense Department signs OpenAI for $200M 'frontier AI' pilot project (theregister.com)

WhatsApp introduces ads in its app (nytimes.com)

Show HN: Canine – A Heroku alternative built on Kubernetes (github.com)

Benzene at 200 (chemistryworld.com)

Snorting the AGI with Claude Code (kadekillary.work)

Battle to eradicate invasive pythons in Florida achieves milestone (phys.org)

Iron nitride permanent magnets made with DIY ball mill [video] (youtube.com)

Show HN: Nexus.js - Fabric.js for 3D (punk.cam)

The drawbridges come up: the dream of a interconnected context ecosystem is over (dbreunig.com)

ZX Spectrum graphics magic (zxonline.net)

Photon transport through the entire adult human head (spiedigitallibrary.org)

Generative AI coding tools and agents do not work for me (blog.miguelgrinberg.com)

Dull Men’s Club (theguardian.com)

Natural rubber with high resistance to crack growth (nature.com)

OpenAI wins $200M U.S. defense contract (cnbc.com)

Nanonets-OCR-s – OCR model that transforms documents into structured markdown (huggingface.co)

Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format (github.com)

Blaze (YC S24) Is Hiring (ycombinator.com)

OpenTelemetry for Go: Measuring overhead costs (coroot.com)

Open-Source RISC-V: Energy Efficiency of Superscalar, Out-of-Order Execution (arxiv.org)

Working on databases from prison (turso.tech)

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons (arxiv.org)

Retrobootstrapping Rust for some reason (graydon2.dreamwidth.org)

Privacy implications of browsers’ (mis)implementations of Widevine EME (2023) (hal.science)

Is gravity just entropy rising? Long-shot idea gets another look (quantamagazine.org)

William Langewiesche, the 'Steve McQueen of Journalism,' Dies at 70 (nytimes.com)

Adding public transport data to Transitous (volkerkrause.eu)

Show HN: dk – A script runner and cross-compiler, written in OCaml (diskuv.com)

Finland warms up the world's largest sand battery, the economics look appealing (techcrunch.com)

Occurences of swearing in the Linux kernel source code over time (vidarholen.net)

Start your own Internet Resiliency Club (bowshock.nl)

Modifying an HDMI dummy plug's EDID using a Raspberry Pi (downtowndougbrown.com)

Writing in the Age of LLMs (sh-reya.com)

What I talk about when I talk about IRs (bernsteinbear.com)

Identity Assertion Authorization Grant (ietf.org)

Jokes and Humour in the Public Android API (voxelmanip.se)

Real-time CO2 monitoring without batteries or external power (news.kaist.ac.kr)

Maya Blue: Unlocking the Mysteries of an Ancient Pigment (mexicolore.co.uk)

Transparent peer review to be extended to all of Nature's research papers (nature.com)

Twin – A Textmode WINdow Environment (github.com)

Simplest C++ Callback, from SumatraPDF (blog.kowalczyk.info)

First 2D, non-silicon computer developed (psu.edu)

Should we design for iffy internet? (bytes.zone)

Quantum mechanics provide truly random numbers on demand (phys.org)

Generative AI coding tools and agents do not work for me

Comments (180)