This article was a bit confusing for me. It starts off by describing what "doing it wrong" looks like (okay). It then goes on to talk about Agents. Perhaps it's just that my human brain needs a firmware update, but I was expecting the "what doing it wrong looks like" section to be followed by a "what doing it right looks like" section. Instead, the next paragraph just begins with "Agents".
Sure, one could surmise that perhaps "doing it right" means "using Agents", but that's not even how the article reads:
> "To make AI development work for you, you’ll need to provide your AI assistant with two things: the proper context and specific instructions (prompts) on how to behave under certain circumstances."
This, to me, doesn't necessitate the usage of agents, so to then enter a section of agents seems to be skipping over a potentially-implied logical connection between the problem in the "doing it wrong" section and how that is solved in the "Agents" section.
Copying code snippets into web UIs and testing manually is slow and clunky, but Agents are essentially just automations around these same core actions. I feel this article could've made a stronger point by getting at the core of what it means to do it wrong.
• Is "doing it wrong" indicated by the time wasted by not using an agentic mechanism vs manual manipulation?
• Is "doing it wrong" indicated by manually switching between tools instead of using MCP to automate tool delegation?
Having written several non-trivial agents myself using Gemini and OpenAI's APIs, the main difference between handing off a task to an agent and manually copy/pasting into chat UIs is efficiency — I usually first do a task manually using chat UIs, but once I have a pattern established, or have identified a set of tools to validate responses, I can then "agentify" it if it's something I need to do repeatedly. But the quality of both approaches is still dependent on the same core principles: adequate context (no more nor less than what keeps the LLM's attention on the task at hand) and adequate instructions for the task (often with a handful of examples). In this regard, I agree with the author, as correct context + instructions are the key ingredients to a useful response. The agentic element is an efficiency layer on top of those key ingredients which frees up the dev from having to manually orchestrate, and potentially avoids human error (and potentially introduces LLM error).
Am I missing something here?
mattkrick · 37m ago
I want to believe, and I promise I'm not trying to be a luddite here. Has anyone with decent (5+ years) experience built a non-trivial new feature in a production codebase quicker by letting AI write it?
Agents are great at familiarizing me with a new codebase. They're great at debugging because even when they're wrong, they get me thinking about the problem differently so I ultimately get the right solution quicker. I love using it like a super-powered search tool and writing single functions or SQL queries about the size of a unit test. However, reviewing a junior's code ALWAYS takes more time than writing it myself, and I feel like AI quality is typically at the junior level. When it comes to authorship, either I'm prompting it wrong, or the emperor just isn't wearing clothes. How can I become a believer?
rco8786 · 18m ago
“Kinda”. I run Claude code on a parallel copy of our monorepo, while I use my primary copy.
I typically only give Claude the boring stuff. Refactors, tech debt cleanup, etc. But occasionally will give it a real feature if the urgency is low and the feature is extremely well defined.
That said, I still spend a considerable amount of time reviewing and massaging Claude’s code before it gets to PR. I haven’t timed myself or anything, but I suspect that when the task is suitable for an LLM, it’s maybe 20-40% faster. But when it’s not, it’s considerably slower and sometimes just fails completely.
9rx · 33m ago
> Has anyone with decent (5+ years) experience built a non-trivial new feature in a production codebase quicker by letting AI write it?
I would say yes. I have been blown away a couple of times. But find it is like playing a slot machine. Occasionally you win — most of the time you lose. As long as my employer is willing to continue to cover the bet, I may as well pull the handle. I think it would be pretty hard to convince myself to pay for it myself, though.
ramesh31 · 17m ago
>Has anyone with decent (5+ years) experience built a non-trivial new feature in a production codebase quicker by letting AI write it?
Yes. Claude Code has turned quarter long initiatives into a few afternoons of prompting for me, in the context of multiple different massive legacy enterprise codebases. It all comes down to just reaching that "jesus take the wheel" level of trust in it. You have to be ok with letting it go off and potentially waste hundreds of dollars in tokens giving you nonsense, which it will some times. But when it doesn't it's like magic, and makes the times that it does worth the cost. Obviously you'll still review every line before merging, but that takes an order of magnitude less time than wrestling with it in the first place. It has fundamentally changed what myself and our team is able to accomplish.
glhaynes · 6m ago
>Obviously you'll still review every line before merging, but that takes an order of magnitude less time than wrestling with it in the first place.
Just speculating here, but I wouldn't be surprised if the truth of both parts of this sentence vary quite a bit amongst users of AI coding tools and their various applications; and, if so, if that explains a lot of the discrepancy amongst reports of success/enthusiasm levels.
ath3nd · 19m ago
> and I feel like AI quality is typically at the junior level. When it comes to authorship, either I'm prompting it wrong, or the emperor just isn't wearing clothes. How can I become a believer?
The emperor is stark naked, but the hype is making people see clothes where there is only an hairy shriveled old man.
Sure, I can produce "working" code with Claude, but I have not ever been able to produce good working code. Yes, it can write a okay-ish unit test (almost 100% identical to how I'd have written it), and on a well structured codebase (not built with Claude) and with some preparation, it can kind of produce a feature. However, on more interesting problems it's just slop and you gotta keep trying and prodding until it produces something remotely reasonable.
It's addictive to watch it conjure up trash and you constantly trying to steer it in the right direction, but I have never ever ever been able to achieve the code quality level that I am comfortable with. Fast prototype? Sure. Code that can pass my code review? Nah.
What is also funny is how non-deterministic the quality of the output is. Sometimes it really does feel like you almost fly off with it, and then bam, garbage. It feels like a roulette, and you gotta keep spinning the wheel to get your dopamine hit/reward.
All while wasting money and time, and still it ends up far far worse than you doing it in the first place. Hard pass.
ActionHank · 39m ago
"If you are only using your hammer to hammer nails, you're doing it wrong" then goes on to explain how you should use agents.
I would've thought that following the initial argument and the progression to the latest trend we would've ended at use agents and write specs and these several currently popular MCPs.
I guess my rant is it to arrive at the point that no one knows what the "correct" way to use them is yet. A hammer has many uses.
Sure, one could surmise that perhaps "doing it right" means "using Agents", but that's not even how the article reads:
> "To make AI development work for you, you’ll need to provide your AI assistant with two things: the proper context and specific instructions (prompts) on how to behave under certain circumstances."
This, to me, doesn't necessitate the usage of agents, so to then enter a section of agents seems to be skipping over a potentially-implied logical connection between the problem in the "doing it wrong" section and how that is solved in the "Agents" section.
Copying code snippets into web UIs and testing manually is slow and clunky, but Agents are essentially just automations around these same core actions. I feel this article could've made a stronger point by getting at the core of what it means to do it wrong.
• Is "doing it wrong" indicated by the time wasted by not using an agentic mechanism vs manual manipulation?
• Is "doing it wrong" indicated by manually switching between tools instead of using MCP to automate tool delegation?
Having written several non-trivial agents myself using Gemini and OpenAI's APIs, the main difference between handing off a task to an agent and manually copy/pasting into chat UIs is efficiency — I usually first do a task manually using chat UIs, but once I have a pattern established, or have identified a set of tools to validate responses, I can then "agentify" it if it's something I need to do repeatedly. But the quality of both approaches is still dependent on the same core principles: adequate context (no more nor less than what keeps the LLM's attention on the task at hand) and adequate instructions for the task (often with a handful of examples). In this regard, I agree with the author, as correct context + instructions are the key ingredients to a useful response. The agentic element is an efficiency layer on top of those key ingredients which frees up the dev from having to manually orchestrate, and potentially avoids human error (and potentially introduces LLM error).
Am I missing something here?
Agents are great at familiarizing me with a new codebase. They're great at debugging because even when they're wrong, they get me thinking about the problem differently so I ultimately get the right solution quicker. I love using it like a super-powered search tool and writing single functions or SQL queries about the size of a unit test. However, reviewing a junior's code ALWAYS takes more time than writing it myself, and I feel like AI quality is typically at the junior level. When it comes to authorship, either I'm prompting it wrong, or the emperor just isn't wearing clothes. How can I become a believer?
I typically only give Claude the boring stuff. Refactors, tech debt cleanup, etc. But occasionally will give it a real feature if the urgency is low and the feature is extremely well defined.
That said, I still spend a considerable amount of time reviewing and massaging Claude’s code before it gets to PR. I haven’t timed myself or anything, but I suspect that when the task is suitable for an LLM, it’s maybe 20-40% faster. But when it’s not, it’s considerably slower and sometimes just fails completely.
I would say yes. I have been blown away a couple of times. But find it is like playing a slot machine. Occasionally you win — most of the time you lose. As long as my employer is willing to continue to cover the bet, I may as well pull the handle. I think it would be pretty hard to convince myself to pay for it myself, though.
Yes. Claude Code has turned quarter long initiatives into a few afternoons of prompting for me, in the context of multiple different massive legacy enterprise codebases. It all comes down to just reaching that "jesus take the wheel" level of trust in it. You have to be ok with letting it go off and potentially waste hundreds of dollars in tokens giving you nonsense, which it will some times. But when it doesn't it's like magic, and makes the times that it does worth the cost. Obviously you'll still review every line before merging, but that takes an order of magnitude less time than wrestling with it in the first place. It has fundamentally changed what myself and our team is able to accomplish.
Just speculating here, but I wouldn't be surprised if the truth of both parts of this sentence vary quite a bit amongst users of AI coding tools and their various applications; and, if so, if that explains a lot of the discrepancy amongst reports of success/enthusiasm levels.
The emperor is stark naked, but the hype is making people see clothes where there is only an hairy shriveled old man.
Sure, I can produce "working" code with Claude, but I have not ever been able to produce good working code. Yes, it can write a okay-ish unit test (almost 100% identical to how I'd have written it), and on a well structured codebase (not built with Claude) and with some preparation, it can kind of produce a feature. However, on more interesting problems it's just slop and you gotta keep trying and prodding until it produces something remotely reasonable.
It's addictive to watch it conjure up trash and you constantly trying to steer it in the right direction, but I have never ever ever been able to achieve the code quality level that I am comfortable with. Fast prototype? Sure. Code that can pass my code review? Nah.
What is also funny is how non-deterministic the quality of the output is. Sometimes it really does feel like you almost fly off with it, and then bam, garbage. It feels like a roulette, and you gotta keep spinning the wheel to get your dopamine hit/reward.
All while wasting money and time, and still it ends up far far worse than you doing it in the first place. Hard pass.
I would've thought that following the initial argument and the progression to the latest trend we would've ended at use agents and write specs and these several currently popular MCPs.
I guess my rant is it to arrive at the point that no one knows what the "correct" way to use them is yet. A hammer has many uses.