A staff engineer's journey with Claude Code

227 kmelve 151 9/2/2025, 7:34:24 PM sanity.io ↗

Comments (151)

swframe2 · 5h ago
Preventing garbage just requires that you take into account the cognitive limits of the agent. For example ...

1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

2) For really complex steps, ask the model to write code to visualize the problem and solution.

3) If the model fails on a given step, ask it to add logging to the code, save the logs, run the tests and the review the logs to determine what went wrong. Do this repeatedly until the step works well.

4) Ask the model to look at your existing code and determine how it was designed to implement a task. Some times the model will put all of the changes in one file but your code has a cleaner design the model doesn't take into account.

I've seen other people blog about their tricks and tips. I do still see garbage results but not as high as 95%.

nostrademons · 4h ago
I've found that an effective tactic for larger, more complex tasks is to tell it "Don't write any code now. I'm going to describe each of the steps of the problem in more detail. The rough outline is going to be 1) Read this input 2) Generate these candidates 3) apply heuristics to score candidates 4) prioritize and rank candidates 5) come up with this data structure reflecting the output 6) write the output back to the DB in this schema". Claude will then go and write a TODO list in the code (and possibly claude.md if you've run /init), and prompt you for the details of each stage. I've even done this for an hour, told Claude "I have to stop now. Generate code for the finished stages and write out comments so you can pick up where you left off next time" and then been able to pick up next time with minimal fuss.
hex4def6 · 3h ago
FYI: You can force "Plan mode" by pressing shift-tab. That will prevent it from eagerly implementing stuff.
jaggederest · 3h ago
> That will prevent it from eagerly implementing stuff.

In theory. In practice, it's not a very secure sandbox and Claude will happily go around updating files if you insist / the prompt is bad / it goes off on a tangent.

I really should just set up a completely sandboxed VM for it so that I don't care if it goes rm-rf happy.

adastra22 · 2h ago
Plan mode disabled the tools, so I don’t see how it would do that.

A sandboxed devcontainer is worth setting up though. Lets me run it with —dangerously-skip-permissions

faangguyindia · 1h ago
how can it plan if it does not have access to file read, search, bash tools to investigate things? If it has access to bash tools then it's going to write code, via echo or sed.
jaggederest · 2h ago
I don't know either but I've seen it write to files in plan mode. Very confusing.
oxidant · 52m ago
I've never seen it write a file in plan mode either.
EnPissant · 58m ago
That's not possible. You are misremembering.
sshine · 31m ago
I've seen it run commands that are naively assumed to be reading files or searching directories.

I.e. not its own tools, but command-line executables.

Its assumptions about these commands, and specifically the way it ran them, were correct.

But I have seen it run commands in plan mode.

yahoozoo · 2h ago
How does a token predictor “apply heuristics to score candidates”? Is it running a tool, such as a Python script it writes for scoring candidates? If not, isn’t it just pulling some statistically-likely “score” out of its weights rather than actually calculating one?
astrange · 10m ago
Token prediction is the interface. The implementation is a universal function approximator communicating through the token weights.
rco8786 · 3h ago
I feel like I do all of this stuff and still end up with unusable code in most cases, and the cases where I don't I still usually have to hand massage it into something usable. Sometimes it gets it right and it's really cool when it does, but anecdotally for me it doesn't seem to be making me any more efficient.
jaggederest · 3h ago
The key is prompting. Prompt to within an inch of your life. Treat prompts as source code - edit them in files, use @ notation to bring them into the console. Use Claude to generate its own prompts - https://github.com/wshobson/commands/ and https://github.com/wshobson/agents/ are very handy, they include a prompt-engineer persona.

I'm at the point now where I have to yell at the AI once in a while, but I touch essentially zero code manually, and it's acceptable quality. Once I stopped and tried to fully refactor a commit that CC had created, but I was only able to make marginal improvements in return for an enormous time commitment. If I had spent that time improving my prompts and running refactoring/cleanup passes in CC, I suspect I would have come out ahead. So I'm deliberately trying not to do that.

I expect at some point on a Friday (last Friday was close) I will get frustrated and go build things manually. But for now it's a cognitive and effort reduction for similar quality. It helps to use the most standard libraries and languages possible, and great tests are a must.

Edit: Also, use the "thinking" commands. think / think hard / think harder / ultrathink are your best friend when attempting complicated changes (of course, if you're attempting complicated changes, don't.)

shaunxcode · 1h ago
I am convinced that this comment once read aloud in the cadence of Ginsberg is a work of art!
jaggederest · 59m ago
Now I'm trying to find a text-to-Ginsberg translator. Maybe he's who I sound like in my head.
plaguuuuuu · 2h ago
I've been using a few LLMs/agents for a while and I still struggle with getting useful output from it.

In order for it not to do useless stuff I need to expend more energy on prompting than writing stuff myself. I find myself getting paranoid about minutia in the prompt, turns of phrase, unintended associations in case it gives shit-tier code because my prompt looked too much like something off experts-exchange or whatever.

What I really want is something like a front-end framework but for LLM prompting, that takes away a lot of the fucking about with generalised stuff like prompt structure, default to best practices for finding something in code, or designing a new feature, or writing tests..

dontlaugh · 3h ago
At that point, why not just write the code yourself?
lucasyvas · 3h ago
I reached this conclusion pretty quickly. With all the hand holding I can write it faster - and it’s not bragging, almost anyone experienced here could do the same.

Writing the code is the fast and easy part once you know what you want to do. I use AI as a rubber duck to shorten that cycle, then write it myself.

2muchcoffeeman · 3h ago
I’ve been trapped in a hole of “can I get the agent to do this?” And the change would have taken me 1/10th the time.

Choosing the battles to pick is part of the skill at the moment.

I use AI for a lot of boiler plate, tedious tasks I can’t quite do a vim recording for, small targeted scripts.

skydhash · 2h ago
How many of these boilerplate do you actually have to do? Any script or complicated command that I had to write was worthy to be recorded in some bash alias or preserved somewhere. But they mostly live in my bash history or right next to the project.

The boilerplate argument is becoming quite old.

indiosmo · 13m ago
One recent example of boilerplate for me is I’ve been writing dbt models and I get it to write the schema.yml file for me based on the sql.

It’s basically just a translation, but with dozens of tables, each with dozens of columns it gets tedious pretty fast.

If given other files from the project as context it’s also pretty good at generating the table and column descriptions for documentation, which I would probably just not write at all if doing it by hand.

jprokay13 · 3h ago
I am coming back to this. I’ve been using Claude pretty hard at work and for personal projects, but the longer I do it, the more disappointed I become with the quality of output for anything bigger than a script. I do love planning things out and clarifying my thoughts. It’s a turbocharged rubber duck - but it’s not a great engineer
bcrosby95 · 3h ago
My thoughts on scripts are: the output is pretty bad too, but it doesn't matter as much in a script, because its just a short script, and all that really matters is that it kinda works.
utyop22 · 3h ago
What you're describing is a glorified mirror.

Doesn't that sound ridiculous to you?

kyleee · 3h ago
Partly it seems to be less taxing for the human delivering the same amount of work. I find I can chat with Claude, etc and work more. Which is a double edged sword obviously when it comes to work/life balance etc. But also I am less mentally exhausted from day job and able to enjoy programming and side projects again.
nicoburns · 3h ago
I guess each to their own? I can easily end up coding for 16 hours straight (having a great time) if I'm not careful. I can't imagine I'd have as much patience with an AI.
KerrAvon · 3h ago
I wonder if this is an introvert vs extrovert thing. Chatting with the AI seems like at least as much work as coding to me (introvert). The folks who don't may be extroverts?
halfcat · 17m ago
There is some line here. I don’t know if it’s introvert/extrovert but here are my observations.

I’ve noticed colleagues who enjoy Claude code are more interested in “just ship it!” (and anecdotally are more extroverted than myself).

I find Claude code to be oddly unsatisfying. Still trying to put my finger on it, but I think it’s that I quickly lose context. Even if I understand the changes CC makes, it’s not the same as wrestling with a problem and hitting roadblocks and overcoming. With CC I have no bearing on whether I’m in an area of code with lots of room for error, or if I’m standing in the edge of a cliff and can’t cross some line in the design.

I’m way more concerned with understanding the design and avoiding future pain than my “ship it” colleagues (and anecdotally am way more introverted). I see what they build and, yes, it’s working, for now, but the table relationships aren’t right and this is going to have to be rebuilt later, except now it’s feeding a downstream report that’s being consumed by the business, so the beta version is now production. But the 20 other things this app touches indirectly weren’t part of the vibe coding context, so the design obviously doesn’t account for that. It could, but of course the “ship it” folks aren’t the ones that are going to build out lengthy requirements and scopes of work and document how a dozen systems relate to and interact with each other.

I guess I’m seeing that the speed limit of quality is still the speed of my understanding, and (maybe more importantly) that my weaponizing of my own obsession only works when I’m wrestling and overcoming, not just generating code as fast as possible.

I do wonder about the weaponized obsession. People will draw or play music obsessively, something about the intrinsic motivation of mastery, and having AI create the same drawing, or music, isn’t the same in terms of interest or engagement.

dpkirchner · 2h ago
I don't feel like I need to say too much to the agent to get my work done. I'm pretty dang introverted.

I just don't enjoy the work as much as I did when was younger. Now I want to get things done and then spend the day on other more enjoyable (to me) stuff.

utyop22 · 3h ago
I'm finding what's happening right now kinda bizarre.

The funny thing is - we need less. Less of everything. But an up-tick in quality.

This seems to happen with humans with everything - the gates get opened, enabling a flood of producers to come in. But this causes a mountain of slop to form, and overtime the tastes of folks get eroded away.

Engineers don't need to write more lines of code / faster - they need to get better at interfacing with other folks in the business organisation and get better at project selection and making better choices over how to allocate their time. Writing lines of code is a tiny part of what it takes to get great products to market and to grow/sustain market share etc.

But hey, good luck with that - ones thinking power is diminished overtime by interacing with LLMs etc.

mumbisChungo · 2h ago
>ones thinking power is diminished overtime by interacing with LLMs etc.

Sometimes I reflect on how much more efficiently I can learn (and thus create) new things because of these technologies, then get anxiety when I project that to everyone else being similarly more capable.

Then I read comments like this and remember that most people don't even want to try.

utyop22 · 2h ago
And? Go create more stuff.

Come back and post here when you have built something that has commercial success.

Show us all how it's done.

Until then go away - more noise doesn't help.

mumbisChungo · 2h ago
I don't think there's anything I could tell you about the companies I've built that would dissuade you from your perspective that everyone is as intellectually lazy as your projection suggests.
skydhash · 1h ago
Not GP, but I really want to know how your process is better than anyone else. People have produced quite good software (as in solving problems) on CPU that’s less powerful than what’s on my smart plug. And whose principles is still defining today’s world.
mumbisChungo · 1h ago
I just find that I learn faster by interrogating (or being interrogated by) a lossy encyclopedia than I do by reading textbooks or stackoverflow.

I'm still the one doing the doing after the learning is complete.

MangoCoffee · 4h ago
I've been vibe coding a couple of personal projects. I've found that test-driven development fits very well with vibe coding, and it's just as you said break up the problem into small, testable chunks, get the AI to write unit tests first, and then implement the actual code
yodsanklai · 4h ago
Actually, all good engineering principles which reduce cognitive load for humans work for AI as well.
BoiledCabbage · 3h ago
This is what's so funny about this. In some alternative universe I hope that LLMs never get any better. Because they force so much of good things.

They are the single closest thing we've ever had to objective evaluation on if an engineering practice is better or worse. Simply because just about every single engineering practice that I see that makes coding agents work well also makes humans work well.

And so many of these circular debates and other best practices (TDD, static typing, keeping todo lists, working in smaller pieces, testing independently before testing together, clearly defined codebase practices, ...) have all been settled in my mind.

The most controversial take, and the one I dislike but may reluctantly have to agree with is "Is it better for a business to use a popular language less suited for the task than a less popular language more suited for it." While obviously it's a sliding scale, coding agents clearly weight in on one side of this debate... as little as I like seeing it.

colordrops · 3h ago
This is the big secret. Keep code modular, small, single purpose, encapsulated, and it works great with vibe coding. I want to write a protocol/meta language similar to the markdown docs that Claude et al create that is per module, and defines behavior, so you actually program and compose modules with well defined interfaces in natural language. I'm surprised someone hasn't done it already.
adastra22 · 2h ago
My set of Claude agent files have an explicit set of interface definitions. Is that what you’re talking about?
colordrops · 1h ago
Are Claude agent files per module? If so, then I guess so.
drzaiusx11 · 3h ago
Isn't what you're describing exactly what Kiro aims to solve?
colordrops · 1h ago
Possibly, I've never heard of Kiro, will look into it.
MarkMarine · 1h ago
Works great until it’s stuck and it starts just refactoring the tests to say true == true and calling it a day. I want the inverse of black box testing, like the inside of the box has the model in it with the code and it’s not allowed to reach outside the box and change the grades. Then I can just do the Ralph Wiggum as a software engineer loop to get over the reward hacking tendencies
ants_everywhere · 1h ago
IMO by far the best improvement would be to make it easier for the agent to force the agent to use a success criterion.

Right now it's not easy prompting claude code (for example) to keep fixing until a test suite passes. It always does some fixed amount of work until it feels it's most of the way there and stops. So I have to babysit to keep telling it that yes I really mean for it to make the tests pass.

jason_zig · 5h ago
I've seen people post this same advice and I agree with you that it works but you would think they would absorb this common strategy and integrate it as part of the underlying product at this point...
noosphr · 4h ago
The people who build the models don't understand how to use the models. It's like asking people who design CPUs to build data-centers.

I've interviewed with three tier one AI labs and _no-one_ I talked to had any idea where the business value of their models came in.

Meanwhile Chinese labs are releasing open source models that do what you need. At this point I've build local agentic tools that are better than anything Claude and OAI have as paid offerings, including the $2,000 tier.

Of course they cost between a few dollars to a few hundred dollars per query so until hardware gets better they will stay happily behind corporate moats and be used by the people blessed to burn money like paper.

criemen · 2h ago
> The people who build the models don't understand how to use the models. It's like asking people who design CPUs to build data-centers.

This doesn't match the sentiment on hackernews and elsewhere that claude code is the superior agentic coding tool, as it's developed by one of the AI labs, instead of a developer tool company.

noosphr · 2h ago
Claude code is babies first agentic tool.

You don't see better ones from code tooling companies because the economics don't work out. No one is going to pay $1,000 for a two line change on a 500,000k line code base after waiting four hours.

LLMs today the equivalent of a 4bit ALU without memory being sold as a fully functional personal computer. And like ALUs today, you will need _thousands_ of LLMs to get anything useful done, also like ALUs in 1950 we're a long way off from a personal computer being possible.

Barbing · 4h ago
Very interesting. And plausible.

Doesn't specifically seem to jive with the claim Anthropic made where they were worried about Claude Code being their secret sauce, leaving them unsure whether to publicly release it. (I know some skeptical about that claim.)

nostrademons · 3h ago
A lot of it is integrated into the product at this point. If you have a particularly tricky bug, you can just tell Claude "I have this bug. I expected output 'foo' and got output 'bar'. What went wrong?" It will inspect the code and sometimes suggest a fix. If you run it and it still doesn't work, you can say "Nope, still not working", and Claude will add debug output to the whole program, tell you to run it again, and paste the debug output back into the console. Then it will use your example to write tests, and run against them.
tombot · 4h ago
Claude Code at least now lets you use its best model for planning mode and its cheapest model for coding mode.
candiddevmike · 3h ago
The consulting world parallels here are funny
MikeTheGreat · 4h ago
Genuine question: What do you mean by " ask it to implement the plan in small steps"?

One option is to write "Please implement this change in small steps?" more-or-less exactly

Another option is to figure out the steps and then ask it "Please figure this out in small steps. The first step is to add code to the parser so that it handles the first new XML element I'm interested in, please do this by making the change X, we'll get to Y and Z later"

I'm sure there's other options, too.

Benjammer · 4h ago
My method is that I work together with the LLM to figure out the step-by-step plan.

I give an outline of what I want to do, and give some breadcrumbs for any relevant existing files that are related in some way, ask it to figure out context for my change and to write up a summary of the full scope of the change we're making, including an index of file paths to all relevant files with a very concise blurb about what each file does/contains, and then also to produce a step-by-step plan at the end. I generally always have to tell it to NOT think about this like a traditional engineering team plan, this is a senior engineer and LLM code agent working together, think only about technical architecture, otherwise you get "phase 1 (1-2 weeks), phase 2 (2-4 weeks), step a (4-8 hours)" sort of nonsense timelines in your plan. Then I review the steps myself to make sure they are coherent and make sense, and I poke and prod the LLM to fix anything that seems weird, either fixing context or directions or whatever. Then I feed the entire document to another clean context window (or two or three) and ask it to "evaluate this plan for cohesiveness and coherency, tell me if it's ready for engineering or if there's anything underspecified or unclear" and iterate on that like 1-3 times until I run a fresh context window and it says "This plan looks great, it's well crafted, organized, etc...." and doesn't give feedback. Then I go to a fresh context window and tell it "Review the document @MY_PLAN.md thoroughly and begin implementation of step 1, stop after step 1 before doing step 2" and I start working through the steps with it.

lkjdsklf · 4h ago
The problem is, by the time you’ve gone through the process of making a granular plan and all that, you’ve lost all productivity gains of using the agent.

As an engineer, especially as you get more experience, you can kind of visualize the plan for a change very quickly and flesh out the next step while implementing the current step

All you have really accomplished with the kind of process described is make the worlds least precise, most verbose programming language

Benjammer · 2h ago
I'm not sure how much experience you have, I'm not trying to make assumptions, but I've been working in software over 15 years. The exact skill you mentioned - can visualize the plan for a change quickly - is what makes my LLM usage so powerful, imo.

I can say the right precise wording in my prompt to guide it to a good plan very quickly. As the other commenter mentioned, the entire above process only takes something like 30-120 minutes depending on scope, and then I can generate code in a few minutes that would take 2-6 weeks to write myself, working 8 hr days. Then, it takes something like 0.5-1.5 days to work out all the bugs and clean up the weird AI quirks and maybe have the LLM write some playwright tests or whatever testing framework you use for integration tests to verify it's own work.

So yes, it takes significant time to plan things well for good results, and yes the results are often sloppy in some parts and have weird quirks that no human engineer would make on purpose, but if you stick to working on prompt/context engineering and getting better and faster at the above process, the key unlock is not that it just does the same coding for you, with it generating the code instead. It's that you can work as a solo developer at the abstraction level of a small startup company. I can design and implement an enterprise grade SSO auth system over a weekend that integrates with Okta and passes security testing. I can take a library written in one language and fully re-implement it in another language in a matter of hours. I recently took the native libraries for Android and iOS for a fairly large, non-trivial SDK, and had Claude build me a React Native wrapper library with native modules that integrates both natives libraries and presents a clean, unified interface and typescript types to the react native layer. This took me about two days, plus one more for validation testing. I have never done this before. I have no idea how "Nitro Modules" works, or how to configure a react native library from scratch. But given the immense scaffolding abilities of LLMs, plus my debugging/hacking skills, I can get to a really confident place, really quickly and ship production code at work with this process, regularly.

adastra22 · 2h ago
It takes maybe 30min and then it can go off and generate code that would take literal weeks for me to write. There are still huge productivity gains being had.
lkjdsklf · 1h ago
That has not been my experience at all.

It takes 30-40 minutes to generate a plan and it generates code that would have taken 20-30 minutes to write.

When it’s generating “weeks” worth of code, it inevitably goes off the rails and the crap you get goes in the garbage.

This isn’t to say agents don’t have their uses, but i have not seen this specific problem actually work. They’re great for refactoring (usually) and crapping out proof of concepts and debugging specific problems. It’s also great for exploring a new code base where you have little prior knowledge.

It makes sense that it sucks at generating large amounts of code that fits cohesively into the project. The context is too small. My code base is millions of lines of code. My brain has a shitload more of that in context than any of the models. So they have to guess and check and end up incorrect and poor and i don’t. I know which abstractions exist that i can use. It doesn’t. Sometimes it guesses right. Often Times it doesn’t. And once it’s wrong, it’s fucked for the entire rest of the session so you just have to start over

ants_everywhere · 1h ago
What I do is a step is roughly a reviewable commit.

So I'll say something like "evaluate the URL fetcher library for best practices, security, performance, and test coverage. Write this up in a markdown file. Add a design for single flighting and retry policy. Break this down into steps so simple even the dumbest LLM won't get confused.

Then I clear the context window and spawn workers to do the implementation.

conception · 4h ago
I tell it to generate a todo.md file with hyper atomic todos each requiring 20 loc or less. Then have it go through that. If the change is too big, generate phases (5-25) and then do the todos for each phase. That plus some sort of reference docs/high level plan keeps it going along all right.
rmonvfer · 4h ago
I’d like to add: keep some kind of development documentation where you describe in detail the patterns and architecture of your application and it’s components.

I’ve seen incredible improvements just by doing this and using precise prompting to get Claude to implement full services by itself, tests included. Of course it requires manual correction later but just telling Claude to check the development documentation before starting work on a feature prevents most hallucinations (that and telling it to use the Context7 MCP for external documentation), at least in my experience.

The downside to this is that 30% of your context window will be filled with documentation but hey, at least it won’t hallucinate API methods or completely forget that it shouldn’t reimplement something.

Just my 2 cents.

rvnx · 4h ago
Your tips are perfect.

Most users will just give a vague tasks like: "write a clone of Steam" or "create a rocket" and then they blame Claude Code.

If you want AI to code for you, you have to decompose your problem like a product owner would do. You can get helped by AI as well, but you should have a plan and specifications.

Once your plan is ready, you have to decompose the problem into different modules, then make sure each modules are tested.

The issue is often with the user, not the tool, as they have to learn how to use the tool first.

wordofx · 4h ago
> Most users will just give a vague tasks like: "write a clone of Steam" or "create a rocket" and then they blame Claude Code.

This seems like half of HN with how much HN hates AI. Those who hate it or say it’s not useful to them seem to be fighting against it and not wanting to learn how to use it. I still haven’t seen good examples of it not working even with obscure languages or proprietary stuff.

drzaiusx11 · 3h ago
Anyone who has mentored as part of a junior engineer internship program AND has attempted to use current gen ai tooling will notice the parallels immediately. There are key differences though that are worth highlighting.

The main difference is that with the current batch of genai tools, the AI's context resets after use, whereas a (good) intern truly learns from prior behavior.

Additionally, as you point out, the language and frameworks need to be part of the training set since AI isn't really "learning" it's just prepolulating a context window for its pre-existing knowledge (token prediction), so ymmv depending on hidden variables from the secret (to you, the consumers) training data and weights. I use Ruby primarily these days, which is solidly in the "boring tech" camp and most AIs fail to produce useful output that isn't rails boilerplate.

If I did all my IC contributions via directed intern commits I'd leave the industry out of frustration. Using only AI outputs for producing code changes would be akin to torture (personally.)

Edit: To clarify I'm not against AI use, I'm just stating that with the current generation of tools it is a pretty lackluster experience when it comes to net new code generation. It excells at one off throwaway scripts and making large tedious redactors less drudgerly. I wouldn't pivot to it being my primary method of code generation until some of the more blatant productiviy losses are addressed.

hn_acc1 · 2h ago
When it's best suggestion (for inline typing) is bring back a one-off experiment in a different git worktree from 3 months ago that I only needed that one time.. it does make me wonder.

Now, it's not always useless. It's GREAT at adding debugging output and knowing which variables I just added and thus want to add to the debugging output. And that does save me time.

And it does surprise me sometimes with how well it picks up on my thinking and makes a good suggestion.

But I can honestly only accept maybe 15-20% of the suggestions it makes - the rest are often totally different from what I'm working on / trying to do.

And it's C++. But we have a very custom library to do user-space context switching, and everything is built on that.

LtWorf · 3h ago
If you have to iterate 10 times, that is "not working", since it already wasted way more time than doing it manually to begin with.
salty_frog · 2h ago
This is my algorithm for wetware llms.
adastra22 · 4h ago
This is why the jobs market for new grads and early career folks has dried up. A seasoned developer knows that this is how you manage work in general, and just treats the AI like they would a junior developer—and gets good results.
CuriouslyC · 4h ago
Why bother handing stuff to a junior when an agent will do it faster while asking fewer questions, and even if the first draft code isn't amazing, you can just quality gate with an LLM reviewer that has been instructed to be brutal and do a manual pass when the code gets by the LLM reviewer.
LtWorf · 3h ago
Because juniors learn while LLMs don't and you must explain the same thing over and over forever.
adastra22 · 3h ago
If you are explaining things more than once, you are doing it wrong. Which is not on you as the tools currently suck big time. But it is quite possible to have LLM agents “learn” by intelligently matching context (including historical lessons learned) to conversation.
paulcole · 4h ago
> Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

Tried this on a developer I worked with once and he just scoffed at me and pushed to prod on a Friday.

NitpickLawyer · 4h ago
> scoffed at me and pushed to prod on a Friday.

that's the --yolo flag in cc :D

ale · 3h ago
It’s about time these types of articles actually include the types of tasks being “orchestrated” (as the author writes) that aren’t just plain refactoring chores or React boilerplate. Sanity has quite a backlog of long-requested features and the message here is that these agents are supposedly parallelizing a lot of the work. What kind of staff engineer has “80% of their code” written by a “junior developer who doesn't learn“?
bakugo · 3h ago
Actually providing examples of real tasks given to the AI and the subsequent results would break the illusion and give people opportunities to question the hype. Can't have that.

We'll just keep getting submission after submission talking about how amazing Claude Code is with zero real world examples.

johnfn · 1h ago
Really, zero real world examples? What about this?

https://news.ycombinator.com/item?id=44159166

dingnuts · 1h ago
the kind of engineer who has been Salesified to the point that they write such drivel as "these learnings" instead of "lessons" in an article that allegedly has a technical audience.

it's funny because as I have gotten better as a dev I've gone backwards through his progression. when I was less experienced I relied on Google; now, just read the docs

asdev · 3h ago
Guy said a whole lot of nothing. Said he's improved productivity, but also said AI falls short in all the common ways people have noticed. Also guarantee no one is building core functionality delegating to Claude Code.
aronowb14 · 1h ago
Agreed. I think this Anthropic article is a realistic take on what’s possible (focus on prototyping)

https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135a...

jpollock · 2h ago
Avoiding the boilerplate is part of the job as a software developer.

Abstracting the boilerplate is how you make things easier for future you.

Giving it to an AI to generate just makes the boilerplate more of a problem when there's a change that needs to be made to _all_ the instances of it. Even worse if the boilerplate isn't consistent between copies in the codebase.

albingroen · 4h ago
So we’re supposed to start paying $1k-$1,5k on top of already crazy salaries just to maybe get a productivity boost on trivial to semi trivial issues? I know my boss would not be keen on that at least.
15155 · 1h ago
Hardware companies routinely license individual EDA tool seats that cost more than numerous developer salaries - $1k/year is nothing if it improves productivity by any measurable amount.
saulpw · 2m ago
The OP was saying it's $1k/mo. That's a 5-10% raise, which is a bit more than nothing.
astrange · 8m ago
The high salaries make productivity improvements even more important.
AnotherGoodName · 3h ago
I can't use $20 of credit (gpt-5 thinking via intellij's pro AI subscription) a month right now with plenty of usage so I'm surprised at the $1k figure. Is Claude that much more expensive? (a quick Google suggests yes actually).

Having said the above some level of AI spending is the new reality. Your workplace pays for internet right? Probably a really expensive fast corporate grade connection? Well they now also need to pay for an AI subscription. That's just the current reality.

everforward · 3h ago
I don't know what Intellij's AI integration is like, but my brief Claude Code experience is that it really chews through tokens. I think it's a combination of putting a lot of background info into the context, along with a lot of "planning" sort of queries that are fairly invisible to the end user but help with building that background for the ultimate query.

Aider felt similar when I tried it in architect mode; my prompt would be very short and then I'd chew through thousands of tokens while it planned and thought and found relevant code snippets and etc.

No comments yet

oblio · 3h ago
The fast corporate internet connection is probably 1000$ for 100 developers or more...
albingroen · 4h ago
And remember. This is on subsadised prices.
sdesol · 2h ago
It will certainly be interesting to see how businesses evolve in the upcoming years. What is written in stone is, you (employee) will be measured and I am curious to see what developers will be measured by in the future. Will you be at a greater risk of layoffs/lack of promotions/etc. if you spend more on AI? How do you as a developer prove that it is you and not the LLM that should be praised?
resonious · 3h ago
Interesting that this guy uses AI for the initial implementation. I do the opposite. I always build the foundation. That way I know how things work fundamentally. Then I ask agents to do boilerplate tasks. They're really good at following suit, but very bad at architecture.
f311a · 3h ago
Yeah, LLMs are pretty bad at planning maintainable architecture. They don’t refactor it when code is evolving and probably can’t do it due to context limitations.
tkgally · 2h ago
Anthropic just posted an interview with Boris Cherny, the creator of Claude Code. He also offers some ideas on how to use it.

“The future of agentic coding with Claude Code”

https://youtu.be/iF9iV4xponk

nikcub · 2h ago
> budget for $1000-1500/month for a senior engineer going all-in on AI development.

Is this another case of someone using API keys and not knowing about the claude MAX plans? It's $100 or $200 a month, if you're not pure yolo brute-force vibe coding $100 plan works.

https://www.anthropic.com/max

reissbaker · 47m ago
Yeah $1k-1.5k seems absurdly high. The $200/month 20x variant of the Max plan covers an insane amount of usage, and the rate limits reset every five hours. Hard to imagine needing it so badly that you're blowing through that rate limit multiple times a day, every day... And if you are, I think switching to per-token payment would probably cost a lot more than $1k.
RomanPushkin · 3h ago
There is one thing I would highly recommend to anyone using Claude or any other agents: logging. I can't emphasize it more, if you have logging you can take the whole log file, dump it into AI, outline the problem and likely you're getting solution or would advance to the next step. Logging is everything.
meerab · 3h ago
I have barely written any code since my switch to Claude Code! It's the best thing since sliced bread!

Here's what works for me:

- Detailed claude.md containing overall information about the project.

- Anytime Claude chooses a different route that's not my preferred route - ask my preference to be saved in global memory.

- Detailed planning documentation for each feature - Describe high-level functionality.

- As I develop the feature, add documentation with database schema, sample records, sample JSON responses, API endpoints used, test scripts.

- MCP, MCP, MCP! Playwright is a game changer

The more context you give upfront, the less back-and-forth you need. It's been absolutely transformative for my productivity.

Thank you Claude Code team!

f311a · 3h ago
What you’re working on? In my industry it fails half of the time and comes up with absolute nonsense. The data just don’t exist for our problems, it can only work when you guide it and ask for a few functions at max.
meerab · 2h ago
I am working on VideoToBe.com - and my stack is NextJS, Postgresql and FastAPI.

Claude code is amazing at producing code for this stack. It does excellent job at outputting ffmpeg, curl commands, linux shell script etc.

I have written detailed project plan and feature plan in MarkDown - and Claude has no trouble understanding the instructions.

I am curious - what is your usecase?

jazzyjackson · 2h ago
Does Claude Code provide some kind of "global memory" the llm refers to, or is this just a request you make within the the llm's context window? Just curious hadn't heard the use of the term

EDIT: I see, you're asking Claude to modify claude.md to track your preference there, right?

https://docs.anthropic.com/en/docs/claude-code/memory

meerab · 2h ago
Yes. /init will initialize the project and save initial project information and preference.

Ask Claude to update the preference and document the moment you realize that claude has deviated away from the path.

ethanwillis · 3h ago
Personally, I give Claude a fully specified program as my prompt so that it gives me back a working program 100% of the time.

Really simple workflow!

kbuchanan · 4h ago
For me, working mostly in Planning Mode skips much of the initial misfires, and often leads to correct outcomes for the first edit.
cjonas · 21m ago
Once thing I've noticed is the difference in code quality by language. I'm constantly disappointed by the output of python code. I have to correct it to follow even the most basic software development principles (DRY, etc).

Typescript on the other hand, seems to do much better on first pass. Still not always beautiful code, but much more application ready.

My hypothesis is that this is due to the billions LOC of Jupyter Notebook it was probably trained on :/

syspec · 1h ago
Does this work for others when working in other domains? When creating a Swift application, I can't imagine creating 20 agents and letting them go to town. Same for the backend of such an application if it's in say, Java+Springboot
drudolph914 · 25m ago
to throw my hat into the ring, I am in no way shy about using the AI tooling and I like using it, but I am happy we're finally seeing people talk about AI that matches with my personal reality with the tools.

for the record, I've been bullish on the tooling from the beginning

My dev-tooling AI journey has been chatGPT -> vscode + copilot -> early cursor adopter -> early claude + cursor adopter -> cursor agent with claude -> and now claude code

I've also spent a lot of time trying out self-hosted LLMs such as couple version of Qwen coder 2.5/3 32B, as well as deepseek 30B - and talking to them through the vscode continue.dev extension

My personal feelings are that the AI coding/tooling industry has seen a major plateau in usefulness as soon as agents became apart of the tooling. The reality is coding is a highly precise task, and LLMs down to the very core of the model architecture are not precise in the way coding needs them to be. and it's not that I don't think we won't one day see coding agents, but I think it will take a deep and complete bottom up kind of change and an possibly an entirely new model architecture to get us to what people imagine a coding agent is

I've accepted to just use claude w/ cursor and to be done with experimenting. the agent tooling just slows my engineering team down

I think the worst part about this dev tooling space is the comment sections on these kinds of articles is completely useless. it's either AI hype bots just saying non-sense, or the most mid an obvious takes that you here everywhere else. I've genuinely have become frustrated with all this vague advice and how the AI dev community talks about this domain space. there is no science, data, or reason as to why these things fail or how to improve it

I think anyone who tries to take this domain space seriously knows that there's limit to all this tooling, we're probably not going to see anything group breaking for a while, and there doesn't exist a person, outside the AI researchers at the the big AI companies, that could tell ya how to actually improve the performance of a coding agent

I think that famous vibe-code reddit post said it best

"what's the point of using these tools if I still need a software engineer to actually build it when I'm done prototyping"

axus · 4h ago
I like his point about more objectivity and zero ego. You don't have to worry about hurting an AI's feelings or your own when you throw away code.
awesome_dude · 4h ago
But I still find myself needing (strongly) to let Claude know when it's made a breakthrough that would have been hard work on my own.
CharlesW · 3h ago
Good creators tend to treat their tools with respect, and I can't imagine any reason we shouldn't feel gratitude toward our tools after a particularly satisfying session.

Also, there may be selfish reasons to do this as well: (1) "Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance" https://arxiv.org/abs/2402.14531 (2) "Three Things to Know About Prompting LLMs" https://sloanreview.mit.edu/article/three-things-to-know-abo...

groby_b · 4h ago
Curious: Do you also laud your compiler for particularly good optimizations?
awesome_dude · 3h ago
There's a couple of things there

1. I don't see the output of the compiler, as in, all I get is an executable blob. It could be inspected, but I don't think that I ever have in my 20+ year career. Maybe I lie and I've rocked up with a Hex editor once or twice, out of pure curiousity, but I've never got past looking for strings that I recognise.

2. When I use Claude, I am using it to do things that I can do, by hand, myself. I am reviewing the code as I go along, and I know what I want it to do because it's what I would be writing myself if I didn't have Claude (or Gemini for that matter).

So, no, I have never congratulated the compiler (or interpreter, linker, assembler, or even the CPU).

Finally, I view the AI as a pairing partner, sometimes it's better than me, sometimes it's not, and I have to be "in the game" in order to make sure I don't end up with a vibe coded mess.

edit: This is from yesterday (Claude had just fixed a bug for me - all I did was paste the block of code that the bug was in, and say "x behaviour but getting y behaviour instead)

perfect, thanks

Edit You're welcome! That was a tricky bug - using rowCount instead of colCount in the index calculation is the kind of subtle error that can be really hard to spot. It's especially sneaky because row 0 worked correctly by accident, making it seem like the logic was mostly right. Glad we got it sorted out! Your Gaps redeal should now work properly with all the 2s (and other correctly placed cards) staying in their proper positions across all rows.

LtWorf · 3h ago
You've got to check the assembly, not the binary, for optimisations…
awesome_dude · 3h ago
Yeah - or I could just not care unless I have to (which, in the last 20 plus years, has been... let me think... oh, right... never)
nh43215rgb · 3h ago
$1000-1500/month for ai paid by employer... that's quite nice. I wonder how much would it cost to run couple of claude code instance to run 24/7 indefinitely. If company's got resources they might as well try that against their issues.
jedberg · 3h ago
I'd like to share my journey with Claude (not code).

I fed Claude a copy of everything I've ever written on Hacker News. Then I asked it to generate an essay that sounds like me.

Out of five paragraphs I had to change one sentence. Everything else sounded exactly as I would have written it.

It was scary good.

into_ruin · 3h ago
I'm doing a project in a codebase I'm not familiar with in a language I don't really know, and Claude Code has been amazing at _explaining_ thing to me. "Who calls this function," "how is this generated," etc. etc.

I'm not comfortable using it to generate code for this project, but I can absolutely see using it to generate code for a project I'm familiar with in a language I know well.

namesbc · 2h ago
Spending $1500 per-month is a crazy wasteful amount of money
josefrichter · 2h ago
I'm almost sure that we all ended up at the same set of rules and steps how to get the best out of Claude - mine are almost identical, others' I know as well :-)
block_dagger · 4h ago
The author doesn't make it clear why they switched from Cursor to Claude. Curious about what they can do with Claude that can't be done with Cursor. I use both a lot and find Cursor to be superior for the very large codebases I work in.
reissbaker · 41m ago
Pretty much everyone I talk to prefers the opposite, and feels like Claude performs best inside the Claude Code harness and not the Cursor one. But I suppose different strokes for different folks...

Personally I'm a Neovim addict, so you can pry TUIs out of my cold dead hands (although I recognize that's not a preference everyone shares). I'm also not purely vibecoding; I just use it to speed up annoying tasks, especially UI work.

meerab · 2h ago
Personal opinion:

Claude code is more user friendly than cursor with its CLI like interface. The file modifications are easy to view and it automatically runs psql, cd, ls , grep command. Output of the commands is shown in more user friendly fashion. Agents and MCPs are easy to organized and used.

block_dagger · 1h ago
I feel just the opposite. I think Cursor's output is actually in the realm of "beautiful." It's well formatted and shows the user snippets of code and reasoning that helps the user learn. Claude is stuck in a terminal window, so reduced to monospaced bullet lines. Its verbose mode spits out lines of file listings and other context irrelevant to the user.
RomanPushkin · 3h ago
It's easy: Cursor are resellers, they optimize your token usage, so they can make a profit. Claude is the final point, and they offer tokens for the cheapest price possible.
block_dagger · 1h ago
I use Cursor in MAX mode because my employer pays for the tokens. I probably should have mentioned that in my OP. It makes a huge difference.
lordnacho · 4h ago
I'm using Claude all the time now. It works, and I'm amazed it worked so easily for me. Here's what it looks like:

1) Summarize what I think my project currently does

2) Summarize what I think it should do

3) Give a couple of hints about how to do it

4) Watch it iterate a write-compile-test loop until it thinks it's ready

I haven't added any files or instructions anywhere, I just do that loop above. I know of people who put their Claude in YOLO mode on multiple sessions, but for the moment I'm just sitting there watching it.

Example:

"So at the moment, we're connecting to a websocket and subscribing to data, and it works fine, all the parsing tests are working, all good. But I want to connect over multiple sockets and just take whichever one receives the message first, and discard subsequent copies. Maybe you need a module that remembers what sequence number it has seen?"

Claude will then praise my insightful guidance and start making edits.

At some point, it will do something silly, and I will say:

"Why are you doing this with a bunch of Arc<RwLock> things? Let's share state by sharing messages!"

Claude will then apologize profusely and give reasons why I'm so wise, and then build the module in an async way.

I just keep an eye on what it tries, and it's completely changed how I code. For instance, I don't need to be fully concentrated anymore. I can be sitting in a meeting while I tell Claude what to do. Or I can be close to falling asleep, but still be productive.

abraxas · 3h ago
I tried to follow the same pattern on a backend project written in Python/FastAPI and this has been mostly a heartache. It gets kind of close but then it seems to periodically go off the rails, lose its mind and write utter shit. Like braindead code that has no chance of working.

I don't know if this is a question of the language or what but I just have no good luck with its consistency. And I did invest time into defining various CLAUDE.md files. To no avail.

ryandrake · 2h ago
What I find helpful in a large project is whenever Claude goes way off the rails, I correct it, and then tell it to update CLAUDE.md with instructions in its own words how to not do it again in the future. It doesn't stop the initial hallucinations and brainfarts, but it seems to be making the tool slowly better as it adds context for itself.
lordnacho · 3h ago
Has this got anything to do with using a stronger typed language? I've heard that reported, not sure whether it's true since my python scripts tend to be short.

Does it end in a forever loop for you? I used to have this problem with other models.

adastra22 · 2h ago
I also use Rust with Claude Code, like GP. I do not experience forever loops — Claude converges on a working compiling solution every time. Sometimes the solution is garbage, and many times it gets it to “work” by disabling the test. I have layers of scaffolding (critic agents) that prevent this from being something I have to deal with, most of the time.

But yeah, strongly typed languages, test driven development, and good high quality compiler errors are real game changers for LLM performance. I use Rust for everything now.

wg0 · 3h ago
I can second that. Even on plain CRUD with little to no domain logic.
BobbyTables2 · 3h ago
The author will be in upper management before they know it!
rester324 · 4h ago
> If I were to give advice from an engineer's perspective, if you're a technical leader considering AI adoption: >> Let your engineers adopt and test different AI solutions: AI-assisted coding is a skill that you have to practice to learn.

I am sorry, but this is so out of touch with reality. Maybe in the US most companies are willing to allocate you 1000 or 1500 USD/month/engineer, but I am sure that in many countries outside of the US not even a single line (or other type of) manager will allocate you such a budget.

I know for a fact that in countries like Japan you even need to present your arguments for a pizza party :D So that's all you need to know about AI adoption and what's driving it

LtWorf · 23m ago
I love how you are getting downvoted, probably by people who have never set foot outside the USA.
bongodongobob · 2h ago
Depends on the culture. I worked at a place that did $100 million in sales a year and if the cost was less than $5k for something we needed, management said just fuckin do it, don't even ask. I also worked at a place that did $2 billion a year and they required multi-level approval for MS project pro licenses. All depends.

Edit: Why is this downvoted? Different corp cultures have different ideas about what is worthwhile. Some places value innovation and experimentation and some places don't.

furyofantares · 5h ago
I've come around on something like this. I start by putting a little effort into a prompt and into providing context, but not a ton - and see where Claude Code gets with it. It might even get what I asked for working in terms of features, but it's garbage code. This is a vibe session, not caring about the code at all, or hardly at all.

I notice what worked and what didn't, what was good and what was garbage -- and also how my own opinion of what should be done changed. I have Claude Code help me update the initial prompt, help me update what should have been in the initial context, maybe add some of the bits that looked good to the initial context as well, and then write it all to a file.

Then I revert everything else and start with a totally blank context, except that file. In this session I care about the code, I review it, I am vigilant to not let any slop through. I've been trying for the second session to be the one that's gonna work -- but I'm open to another round or two of this iteration.

soperj · 5h ago
and do you find this takes longer or shorter than just doing it yourself from scratch?
shinecantbeseen · 4h ago
I’m with you. Sometimes it really just feels like we’re just tacking on the cognitive load of managing the drunk senior in addition to the problem of hand instead of just dealing with the problem at hand.
sfjailbird · 4h ago
A hundred times more time is spent reading a given piece of code, than it took writing it, in the lifetime of that program.

OK I made up the statistic, but the core idea is true, and it's something that is rarely considered in this debate. At least with code you wrote, you can probably recognize it later when you need to maintain it or just figure out what it does.

adastra22 · 2h ago
Most code is never read, to be honest.
furyofantares · 2h ago
In the olden days I read the code I wrote probably 2-3 times while in the process of reading it, and then almost always once in full just before submitting it.
furyofantares · 4h ago
Quite a bit shorter. Plus I can do the a good chunk of the work (first iteration) in contexts where I couldn't before, where I require less focus, and it uses less of my energy.

I think I can also end up with a better result, and having learned more myself. It's just better in a whole host of directions all at once.

I don't end up intimately familiar with the solution however. Which I think is still a major cost.

bongodongobob · 4h ago
Not OP, I don't care if it's the same amount of time because I can do it drunk/while doing other things. Not sure why how long does it take is the be all end all for some people.
sigmonsays · 2h ago
every god damn time AI hallucinates a solution that is not real (in ChatGPT)

I havn't put a huge effort into learning to write prompts but in short, it seems easier to write the code myself than determine prompts. If you don't know every detail ahead of time and ask a slightly off question, the entire result will be garbage.

dakiol · 4h ago
To all the engineers using claude code: how do you submit your (well, claude’s) to review? Say, you have a big feature/epic to implement. Typically (pre-ai) times you would split it in chunks and submit each chunk as PR to be reviewed. You don’t want to submit dozens of file changes because nobody would review it. Now with llms, one can easily explain the whole feature to the machine and they would output the whole code just fine. What do you do? You divide it manually for review submission? One chunk after another?

It’s way easier to let the agent code the whole thing if your prompt is good enough than to give instructions bit by bit only because your colleagues cannot review a PR with 50 file changes.

athrowaway3z · 3h ago
Practically - you can commit it all after you're done and then tell it to tease apart the commit into multiple well documented logical steps.

"Ask the LLM" is a good enough solution to an absurd number of situations. Being open to questioning your approach - or even asking the LLM (with the right context) to question your approach has been valuable in my experience.

But from a more general POV, its something we'll have to spend the next decade figuring out. 'Agile'/scrum & friends is a sort of industry-wide standard approach, and all of that should be rethought - once a bit of the dust settles.

We're so early in the change that I haven't even seen anybody get it wrong, let alone right.

yodsanklai · 4h ago
I split my diffs like I've always did so they can be reviewed by a human (or even an AI which won't understand 50 file changes).

The 50 file changes is most likely unsafe to deploy and unmaintainable.

edverma2 · 2h ago
I built a tool to split up a single PR into multiple nice commits: https://github.com/edverma/git-smart-squash
Yoric · 4h ago
I regularly write big MRs, then cut them into 5+ (sometimes 10+) smaller MRs. What does Claude Code change here?
dakiol · 3h ago
The split seems artificial now. Before, an average engineer would produce code sequentially, chunk after chunk. Each chunk submitted only after the previous one was reviewed and approved. Today, one could submit the whole thing for review. Also, if machines can write it, why not let machines review it too? Seems weird not to do so.
bongodongobob · 4h ago
Do whatever you want. Tell it to make different patches in chunks if you want. It'll do what you tell it to do.