Over the last 2 weeks (evenings only) I've spend a lot of time crafting the "perfect prompt" for claude code to one shot the project. I ended up with a rather small CLAUDE.md file that references 8 other MD files, ranging from project_architecture, models_spec, build_sequence, test_hierarchy, test_scenarios, and some other files.
It is a project for model based governance of Databricks Unity Catalog, with which I do have quite a bit of experience, but none of the tooling feels flexible enough.
Eventually I ended up with 3 different subagents that supported in the development of the actual planning files; a Databricks expert, a Pydantic expert, and a prompt expert.
The improvement on the markdown files was rather significant with the aid of these. Ranging from old pydantic versions and inconsistencies, to me having some misconceptions about unity catalog as well.
Yesterday eve I gave it a run and it ran for about 2 hours with me only approving some tool usage, and after that most of the tools + tests were done.
This approach is so different than I how used to do it, but I really do see a future in detailed technical writing and ensuring we're all on the same page.
In a way I found it more productive than going into the code itself.
A downside I found is that with code reading and working on it I really zone in.
With a bunch of markdown docs I find it harder to stay focused.
Curious times!
a_bonobo · 3h ago
I feel we're developing something like what made Test-Driven Development so strong: TTD forced you to sit down and design your system first, rather than making it all up on the fly. In the past we mapped the system while we were building the code for it.
This kind of AI-driven development feels very similar to that. By forcing you to sit down and map the territory you're planning to build in, the coding itself becomes secondary, just boilerplate to implement the design decision you've made. And AI is great at boilerplate!
jmull · 1h ago
Test-driven and prompt-driven development aside, I never understood why people (and groups) spend many hours (or 1000s, or 10000s of hours) building things when they don't really know what they're building.
(I've certainly seen it done though, with predicable result.)
andruby · 44m ago
Do you mean people that build something without a plan? Some people have an idea (or vision) but not a plan and they figure things out along the way. Other want to / need to plan everything ahead of time first.
In my anecdotal case: I behave like the former in some cases (crafting) and the latter in others (travel planning)
I wouldn't say one way is always better than the other.
potsandpans · 36m ago
Most people do not spend thousands of hours building something "not knowing what they're building."
On the contrary, in my experience it's much more important to "play" with a concept and see it working. Too many engineers think they're going to architect a perfect solution without ever getting code on the page.
A slapdash prototype is worth the weight of 100 tests and arch diagrams.
Note: I'm not saying the latter is not important. My comment is, it's ok (and encouraged) to do potentially throwaway work to understand the domain better.
danmaz74 · 2h ago
> "TTD forced you to sit down and design your system first, rather than making it all up on the fly"
It's interesting because I remember having discussions with a colleague who was a fervent proponent of TDD where he said that with that approach you "just let the tests drive you" and "don't need to sit down and design your system first" (which I found a terrible idea).
hetspookjee · 3h ago
That is exactly what this felt like indeed! I found a lot of interest in both refining the test strategy and test decisions, but when it started implementing some core functions were in fact lost in the process. This rather leaky memory still suprises me every now and then. Especially 'undoing' things is a big challenge as the (do not) kind of route versus the (do) route is so much more confusing for the LLM, it seems.
samrus · 2h ago
thats a great way to put it. the LLMs can't design things, thats way too above their capabilities. they can pretend to design things and even fool people, but they're jsut regurgitating other designs from their training data (and for a todo app, thats enough). but it we do the design for them, they're really really good at putting meat on that skeleton
mattmanser · 3h ago
I feel TDD ended up fizzling out quite a bit in the industry, with some evangelists later admitting they'd taken to often writing the code first, then the tests.
To me it's always felt like waterfall in disguise and just didn't fit how I make programs. I feel it's just not a good way to build a complex system with unknown unknowns.
That the AI design process seems to rely on this same pattern feels off to me, and shows a weakness of developing this way.
It might not matter, admittedly. It could be that the flexibility of having the AI rearchitect a significant chunk of code on the fly works as a replacement to the flexibility of designing as you go.
copirate · 1h ago
"Extreme programming" methodology said you should not do TDD if you don't already know how to implement the code. In that case you should instead experiment until you know, and then throw away the experiments and write the code test-first.
Maybe it should be done that way with AI: experiment with AI if you need to, then write a plan with AI, then let the AI do the implementation.
MoreQARespect · 2h ago
TDD fizzled because not enough emphasis was put on writing high level tests which matched user stories and too much emphasis was put on it as a tool of design.
SkyPuncher · 2h ago
No, TDD failed because it assumed you could design a perfect systems before implementation.
It’s a totally waste of time to do TDD to only find out you made a bad design choice or discovered a conflicting problem.
MoreQARespect · 1h ago
This is precisely the problem I alluded to which is solved by writing higher level tests with TDD that make fewer assumptions about your design.
TDD ought to let you make a bad design decision and then refactoring it while keeping the test as is.
matijsvzuijlen · 2h ago
What makes you think TDD assumes that? It sounds like the complete opposite of what TDD is about.
m_fayer · 4h ago
Long after we are all gone and the scrum masters are a barely remembered historical curiosity, there shall remain, humble and eternal, the waterfall model.
actionfromafar · 4h ago
A waterfall, frozen, in time?
unixhero · 3h ago
Well waterfall is how we built the old world. Piece by piece, module by module, roads, bridges, buildings, boats.
ionwake · 2h ago
I got intrigued by your comment, I couldn't wrap my head about a process just changing. Got AI to throw out this table, but I think its of interest:
Waterfall ~1970, Agile ~2001, Continuous (DevOps) ~2015, Autonomous Dev ~2030, Self-Evolving Systems ~2040, Goal-Directed Ecosystems ~2050+
Den_VR · 1h ago
What do you think about “goal-directed ecosystems” mapping to Mulder’s Collaborative Agent Maturity Model (CAMM)?
razemio · 4h ago
That is exactly my issue. I am more districted while being more productive. It feels just wrong, but works for now. In the long run, I need to find a solution for this. What works best for now, is to let multiple agents run on multiple repos of the same project solving different tasks. This way, I stay somewhat focused, since I constantly need to approve things. Just like a Projekt Manager with a big team... Indeed curious times.
ionwake · 2h ago
I agree I think this is the way
mprivat · 3h ago
That's pretty novel. What framework is actually running the agents in your experiment?
I plan to do a more detailed write down sometime next week or the week after when I've "finished" my 100% vibe coded website.
brainless · 3h ago
These days, I record product details, user journey, etc. with voice, and kick off the product technical details documentation process. Minimal CLAUDE.md. GitHub based workflow for software development process. I am struggling with generating good CI workflows, on it.
What I don't get about all the "if you plan it out first, it gets better" approach is, how did they work before?!
For anything bigger than small size features, I always think about what I do and why I do things. Sometimes in my head, sometimes on paper, a Confluence page or a white board.
I don't really get it. 80 % of software engineering is to figure out what you need and how to achieve this. You check with the stake holders, write down the idea of what you want to do and WHY you want to do it. You do some research.
Last 20 % of the process is coding.
This was always the process. You don't need AI for proper planning and defining your goals.
Scarblac · 2h ago
I think I usually mix the coding and the designing more. Start coding something, then keep shaping and improving it for a while until it's good.
And of course for most things, there's a pretty obvious way it's probably going to work, no need to spend much time on that.
divan · 1h ago
That might be true for large dev teams with an established culture. But a lot of development is happening in different settings - solo projects, small teams, weekend side-projects, personal tools crafting, quick POC coding, etc. Not all software is a complex product that needs to be sold and maintained. One thing that I always loved about being a developer is that you can create any custom piece of software you need for yourself – even if it's for a single-time task - and don't care about releasing/supporting corner cases/other users.
In almost all these cases, development process is a mix of coding & discovering, updating the mental model of the code on the go. It almost never starts with docs, spec or tests. Some projects are good for TDD, but some don't even need it.
And even for these use-cases, using AI coding agents changes the game here. Now it does really matter to first describe the idea, put it into spec, and verbalize everything in your head that you think will matter for the project.
Nowadays, the hottest programming language is English, indeed.
ticoombs · 5h ago
I used to joke about prompt engineering. But by jiminy it is a thing now. I swear sometimes I waste a good 10-20minutes writing up a good prompt and initial plan just so that claudecode can systematically implement something.
My usage is nearly the same as OP. Plan plan plan save as a file and then new context and let it rip.
That's the one thing I'd love, a good cli (currently using charm and cc) which allows me to have an implementation model, a plan model and (possibly) a model per sub agent. Mainly so I can save money by using local models for implementation and online for plans or generation or even swapping back. Charm has been the closest I've used so far allowing me to swap back and forth and not lose context. But the parallel sub-agent feature is probably one of the best things claudecode has.
(Yes I'm aware of CCR, but could never get it to use more than the default model so :shrug:)
NitpickLawyer · 5h ago
> I used to joke about prompt engineering. But by jiminy it is a thing now.
This is the downside of living in a world of tweets, hot takes and content generation for the sake of views. Prompt engineering was always important, because GIGO has always been a ground truth in any ML project.
This is also why I encourage all my colleagues and friends to try these tools out from time to time. New capabilities become aparent only when you try them out. What didn't work 6mo ago has a very good chance of working today. But you need a "feel" for what works and what doesn't.
I also value much more examples, blogs, gists that show a positive instead of a negative. Yes, they can't count the r's in strawberry, but I don't need that! I don't need the models to do simple arithmetic wrong. I need them to follow tasks, improve workflows and help me.
Prompt engineering was always about getting the "google-fu" of 10-15 years ago rolling, and then keeping up with what's changed, what works and what doesn't.
scastiel · 5h ago
I agree, prompt engineering really is the foundation of working with AI (whether it’s for coding or anything else).
BiteCode_dev · 4h ago
Projects using AI are the best documented and tested projects I worked on.
They are well documented because you need context for the LLM to be performant. And they are well tested because the cost of producing test got lower since they can be half generated, while the benefit of having tests got higher, since they are guard rails for the machine.
People constantly say code quality is going to plummet because of those tools, but I think the exact opposite is going to happen.
oblio · 4h ago
I find it funny that we had to invent tools that will replace, say, 20%+ of developers out there to finally have developers to write docs :-))
baq · 3h ago
The difference today is the docs are being read. In the before times, unless you were building a foundational library, docs would get updated when a presentation was needed to announce a great success, or maybe not even then. Nowadays if you want coding agents to be efficient, doc quality is paramount.
IOW there’s very clear ROI on docs today, it wasn’t so earlier.
oblio · 3h ago
I guess also the opposite is true, if a developer wants job security, not writing docs (and writing obfuscated code) is still the way to go :-p
samrus · 2h ago
honestly "prompt engineering" is just the vessel for architecting the solution. its like saying "diagram construction" really took off as a skill. its architecting with a new medium
Crowberry · 5h ago
I’ve recently tried out Claude Code for a bit, I’ll make sure to give the suggested approach a go! It sounds like a nice workflow.
But I’m negatively surprised with the amount of money CC costs. Just a simple refactoring cost me about 5min + 15min review and 4usd, had I done it myself it might have taken 15-20min as well.
How much money do you typically spend on features using CC? Nobody seems to mention this
dustingetz · 3h ago
the investor bull case in AI is to cannabalize the labor markets at 15% margin, so 1:1 labor:AI budget is where we are headed next - e.g. $100k/100k for a senior dev. The AI share will come out of dev budgets, so expect senior salaries to fall and team sizes to shrink by a lot if this stuff works. Remember we’re in the land grab phase, all subsidized by VCs, but we’re speed running through the stages as and this phase appears to be ending based on twitter VC sentiment. There’s only so many times you can raise another $500M for 9 months of operating cost at -100% gross margin.
fuckaj · 3h ago
What if once the 100k dev jobs are gone the equivalent value in terms of AI is nowhere near that. Say it is 5k instead?
Due to oversupply. First you needed humans who can code. But now you need scalable compute.
Equivalent would be hiring those people to wave a flag infront of a car. They are replaced bt modern cars, but you dont get to receive the flag wavers wage as captured value for long if at all.
dustingetz · 2h ago
lets call that stage 2 when the labor/ai spend drops below 1.0, to understand what that might look like i would compare to surgeon model. so like $500k for a Surgeon to manage outcomes with 10x AI leverage. Arguments for: managers are compensated proportional to the capital risk they are responsible for so it makes sense that as the leverage increases the comp increases, even if the ratio drops before it u-turns and climbs. Four arguments against: 1) the economy is going to look very different with different dynamic equilibrium in surprising places; 2) this assumes software best practices remain as they are today with no disruptive breakthroughs which is unlikely, 3) systems that promote inequality are vulnerable to constant attack so it may not be a stable equilibrium 4) it may not even work - code complexity scales faster than linearly with lines of code, where is the break even point and is it higher or lower on the complexity curve than the breakeven point today?
naiv · 5h ago
You can sign up for a subscription and pay from $20-$200 flat with some daily/weekly restrictions on token usage.
Indeed, switching partially from Cursor to Claude Code increased the bill by a lot! Fortunately I use Claude Code mostly at work and I had no trouble to convince my boss to pay for it. But I’m still not sure how I’ll continue building side projects with Claude Code. Not sure I want to spend $20 each time I want to bootstrap an app in an evening just for fun…
k9294 · 5h ago
Why not to subscribe to pro or max? I calculated my CC usage this month (I'm on a Max 200$ plan), it’s close to 2.5k$... Its just crazy to use API at price right now.
roessland · 3h ago
I wish I could, but they banned me and won't disclose why.
That being said, Claude Code produces the best code I've seen from an AI coding agent, and the subscriptions are a steal.
jstummbillig · 1h ago
I heard Codex CLI is doing good things now, might give that a go (I have not yet).
edg5000 · 3h ago
You either get the 20 EUR/m for Sonnet and 100 for Opus. I used Sonnet and switched to Opus eventually. But Sonnet was also good. For my purposes I don't run out of the token limits, although I can't speak for the future.
viraptor · 4h ago
> had I done it myself it might have taken 15-20min as well.
Could you spend that 15-20min on some other task while this one works in the background?
conradfr · 3h ago
Well the 15 minutes code review are still there.
afro88 · 1h ago
This is the key to getting decent feature work out of Claude Code. I've had good success recently using GPT-5 High (in Cursor) to write the plan, then take that to Claude Code to implement.
You can get an extra 15-20% out of it if you also document the parts of the codebase you expect to change first. Let the plan model document how it works, architecture and patterns. Then plan your feature with this in the context. You'll get better code out of it.
Also, make sure you review, revise and/or hand edit the docs and plans too. That pays significant dividends down the line.
garciasn · 1h ago
We have Google Workspace at work and I find Gemini is awesome at “academic style” writeups but less good at writing code compared to CC.
So; I have Gemini write up plans for something, having it go deep and be as explicit as possible in its explanations.
I feed this into CC and have it implement the change in my code base. This has been really strong for me in making new features or expanding upon others where I feel something should be considerably improved.
The product I’ve built from the ground up over the last 8w is now in production and being actively demoed to clients. I am beyond thrilled with my experience and its output. As I’ve mentioned before on HN, we could have done much of this work ourselves with our existing staff, but we could not have done the front end work. What I feel might have taken well over a year and way more engineering and data science effort was mostly done in 2m. Features are added in seconds rather than hours.
I’m amazed by CC and I love reading these articles which help me to realize my own journey is being mirrored by others.
zemvpferreira · 5h ago
It’s interesting to me that trying to optimise AI tools is leading many engineers to discover the value in good communication and expectation setting. The diva/autist stereotype of 10x programmers is due for a review.
mattjenner · 3h ago
This has been my experience with replit as well. It needs to use design docs as the source of task and truth, as it starts to crumble as the app size increases.
With OpenAI I find ChatGPT just slows to a crawl and the chat becomes unresponsive. Asking it to make a document, to import into a new chat, helps with that.
On a human level, it makes me think that we should do the same ourselves. Reflect, document and dump our ‘memory’ into a working design doc. To free up ourselves, as well as our LLMs.
sputknick · 1h ago
Has anyone figured out an elegant way to add front-end design to a process like this? Every implementation I see people use includes either vague references to front-end frameworks, or figma images. It doesn't feel like a cohesive design solution.
anemic · 4h ago
I too recently discovered this workflow and I'm blown by it. The key IMHO is first to give claude as low requirements as possible and let it's plan mode roam freely. Writing a reporting for sales metrics? "Ultrathink relevant sales metrics" and it will give you a lot to start ranking which you want, maybe add some that are missing. Then create a new directory for this feature and ask it to write the plan to a file. Then proceed to create an implementation plan, ask it to find all the relevant data from the database and write how to query it. Then finally let it implement it and write tests and end user documentation. And send it to QA.
Need sales forecasting? This used to be an enterprise feature that 10 years ago would have needed a large team to implement correctly. Claude implements a docker container in one afternoon.
It really changes how I see software now. Before there were NDAs and intellectual property and companies too great care not to leak their source code.
Now things have changed, have a complex ERP system that took 20 years to develop? Well, claude can re-implement it in a flash. And write documentation and tests for it. Maybe it doesn't work quite that well yet, but things are moving fast.
samrus · 3h ago
interesting. that living plan document is something humans learn to make and update themselves. these problems are dynamical, requiring the solver to maintain state, and the plan is what records that.
doing it for the LLM really highlights that limitation. they arent trained statefully, not at the foundation model, where it matters. that state gets reproduced on top of the model in the form of "reasoning" and "chain of thought" but that level of scaffolding is a classic example of the bitter lesson. like semantic trees of old.
the representation learning + transformer model needs to be evolved to handle state, then it should be able to do these things itself
ionwake · 3h ago
Does any one know "roughly" how ClaudeCode compares costwise these days to Cursor using OpenAI api? I just remember it being well so expensive I ended up paying hundreds of dollars for it a month
zackify · 3h ago
I use the $20 plan and get plenty of usage.
No comments yet
merlincorey · 5h ago
> However, it won’t suggest a radically different approach unless I specifically ask it to, which I have never tried.
Assumptions without evaluation are not trustworthy.
bgwalter · 8m ago
That is another post praising the waterfall model. What Claude Photocopier does here is steal from hundreds of similar projects. It does not design anything and neither are you.
user3939382 · 2h ago
I’m like 15 steps ahead of this guy
AInative_freak · 2h ago
this is exactly how an AI-Native dev thinks
revskill · 3h ago
Vibe coding works if u treat design upfront.
virtualritz · 4h ago
I have been using exactly author's approach with "great success: (quote Borat) over the last two months. The first month with CC was also mainly nudging it along and that only gets you so far.
But since then I have come to have it always write ARCHITECTURE.md and IMPLEMENTATION.md when doing a new feature and CLAUDE-CONTINUE.md. All three live in the resp. folder of the feature (in my case, it's often a new crate or a new module, as I write Rust).
The architecture one is usually the result of some back and forth with CC much like the author describes it. Once that is nailed it writes that out and also the implementation. These are not static ofc, they may get updated during the process but the longer you spend discussing with CC and thinking about what you're doing the less likely this is necessary. Really no surprise there -- this works the same way in meat space. :]
I have an instruction in CLAUDE.md that it should update CLAUDE-CONTINUE.md with the current status, referencing both the other documents, when the context is 2% away from getting compacted.
After the compact it reads the resp. CLAUDE-CONTINUE.md (automatically, since it's referenced in CLAUDE.md) and then usually continues as if nothing happened. Without this my mileage varied as it needs to often read a lot of code (again) first and calibrated to what parts of architecture and implementation it did, before the compact.
I often also have it write out stuff that is needed in dependencies that I maintain or that are part of the project so then it creates ARCHITECTURE-<feature>-<crate>.md and I just copy that over to the resp. repo and tell another CC instance there to write the implementation document and send it off.
A lot of stuff I do is done via Terry [1] and this approach has worked a treat for me. Shout out to these guys, they rock.
Edit: P.S. I have 30+ years R&D experience in my field so I have deep understanding of what I do (computer graphics, system programming, mostly). I have quite a few friends with a decade or less of R&D experience and they struggle to get the same amount of shit done with CC or Ai.
The models are not there yet, you need the experience. I also mainly formulate concisely what I want and what the API should look and the go back and forth with CC, not start with a fuzzy few sentences and cross my fingers that what it comes up with is something I may like and can then mold a tad.
I also found that not getting weird bugs that the model may chase for several "loops" seem correlated with the amount of statically-typed code. I.e. I've been recently working on a Python code base that interfaces with Rust and the number of times CC shot itself in the foot because it assumed a foo was a [foo] and stuff like that is just astounding. This obviously doesn't happen in Rust, the language/compiler catches it and the model 'knows' it can't get away with it so it seems to exercises more rigor (but I may be 'hallucinating' that).
TLDR; I came to the conclusion that statically-typed languages get you higher returns with these models for this reason.
What is the distinction between IMPLEMENTATION.md and *.rs source code files?
virtualritz · 1h ago
Implementation has the details for what is being implemented.
Details what algorithms/approaches to use etc. The reason is that often a single context is not enough and when CC continues the CLAUDE-CONTINUE tells the model What it should do. Not Why (architecture) and How (implementation).
The architecture file is usually more abstract/high level and may also contain info about how the stuff integrates with other parts of the codebase etc.
energy123 · 30m ago
I'm very curious about how you do ARCHITECTURE.md. What level of description is allowed? What are the guardrails?
I have something similar to that where all I do is list out the key types, structs, enums and traits, accompanied by comments describing what they are. I broke it down into four sections corresponding to different layers of abstraction.
But I noticed that over time the LLM will puff up the size and start putting implementations into it, so some prompting discipline is required to keep things terse and inline.
Is your ARCHITECTURE.md similar to mine or is it more like a UML diagram or perhaps an architectural spec in a DSL?
It is a project for model based governance of Databricks Unity Catalog, with which I do have quite a bit of experience, but none of the tooling feels flexible enough.
Eventually I ended up with 3 different subagents that supported in the development of the actual planning files; a Databricks expert, a Pydantic expert, and a prompt expert.
The improvement on the markdown files was rather significant with the aid of these. Ranging from old pydantic versions and inconsistencies, to me having some misconceptions about unity catalog as well.
Yesterday eve I gave it a run and it ran for about 2 hours with me only approving some tool usage, and after that most of the tools + tests were done.
This approach is so different than I how used to do it, but I really do see a future in detailed technical writing and ensuring we're all on the same page. In a way I found it more productive than going into the code itself. A downside I found is that with code reading and working on it I really zone in. With a bunch of markdown docs I find it harder to stay focused.
Curious times!
This kind of AI-driven development feels very similar to that. By forcing you to sit down and map the territory you're planning to build in, the coding itself becomes secondary, just boilerplate to implement the design decision you've made. And AI is great at boilerplate!
(I've certainly seen it done though, with predicable result.)
In my anecdotal case: I behave like the former in some cases (crafting) and the latter in others (travel planning)
I wouldn't say one way is always better than the other.
On the contrary, in my experience it's much more important to "play" with a concept and see it working. Too many engineers think they're going to architect a perfect solution without ever getting code on the page.
A slapdash prototype is worth the weight of 100 tests and arch diagrams.
Note: I'm not saying the latter is not important. My comment is, it's ok (and encouraged) to do potentially throwaway work to understand the domain better.
It's interesting because I remember having discussions with a colleague who was a fervent proponent of TDD where he said that with that approach you "just let the tests drive you" and "don't need to sit down and design your system first" (which I found a terrible idea).
To me it's always felt like waterfall in disguise and just didn't fit how I make programs. I feel it's just not a good way to build a complex system with unknown unknowns.
That the AI design process seems to rely on this same pattern feels off to me, and shows a weakness of developing this way.
It might not matter, admittedly. It could be that the flexibility of having the AI rearchitect a significant chunk of code on the fly works as a replacement to the flexibility of designing as you go.
Maybe it should be done that way with AI: experiment with AI if you need to, then write a plan with AI, then let the AI do the implementation.
It’s a totally waste of time to do TDD to only find out you made a bad design choice or discovered a conflicting problem.
TDD ought to let you make a bad design decision and then refactoring it while keeping the test as is.
Waterfall ~1970, Agile ~2001, Continuous (DevOps) ~2015, Autonomous Dev ~2030, Self-Evolving Systems ~2040, Goal-Directed Ecosystems ~2050+
I plan to do a more detailed write down sometime next week or the week after when I've "finished" my 100% vibe coded website.
Here is my playbook: https://nocodo.com/playbook/
For anything bigger than small size features, I always think about what I do and why I do things. Sometimes in my head, sometimes on paper, a Confluence page or a white board.
I don't really get it. 80 % of software engineering is to figure out what you need and how to achieve this. You check with the stake holders, write down the idea of what you want to do and WHY you want to do it. You do some research.
Last 20 % of the process is coding.
This was always the process. You don't need AI for proper planning and defining your goals.
And of course for most things, there's a pretty obvious way it's probably going to work, no need to spend much time on that.
In almost all these cases, development process is a mix of coding & discovering, updating the mental model of the code on the go. It almost never starts with docs, spec or tests. Some projects are good for TDD, but some don't even need it.
And even for these use-cases, using AI coding agents changes the game here. Now it does really matter to first describe the idea, put it into spec, and verbalize everything in your head that you think will matter for the project.
Nowadays, the hottest programming language is English, indeed.
My usage is nearly the same as OP. Plan plan plan save as a file and then new context and let it rip.
That's the one thing I'd love, a good cli (currently using charm and cc) which allows me to have an implementation model, a plan model and (possibly) a model per sub agent. Mainly so I can save money by using local models for implementation and online for plans or generation or even swapping back. Charm has been the closest I've used so far allowing me to swap back and forth and not lose context. But the parallel sub-agent feature is probably one of the best things claudecode has.
(Yes I'm aware of CCR, but could never get it to use more than the default model so :shrug:)
This is the downside of living in a world of tweets, hot takes and content generation for the sake of views. Prompt engineering was always important, because GIGO has always been a ground truth in any ML project.
This is also why I encourage all my colleagues and friends to try these tools out from time to time. New capabilities become aparent only when you try them out. What didn't work 6mo ago has a very good chance of working today. But you need a "feel" for what works and what doesn't.
I also value much more examples, blogs, gists that show a positive instead of a negative. Yes, they can't count the r's in strawberry, but I don't need that! I don't need the models to do simple arithmetic wrong. I need them to follow tasks, improve workflows and help me.
Prompt engineering was always about getting the "google-fu" of 10-15 years ago rolling, and then keeping up with what's changed, what works and what doesn't.
They are well documented because you need context for the LLM to be performant. And they are well tested because the cost of producing test got lower since they can be half generated, while the benefit of having tests got higher, since they are guard rails for the machine.
People constantly say code quality is going to plummet because of those tools, but I think the exact opposite is going to happen.
IOW there’s very clear ROI on docs today, it wasn’t so earlier.
But I’m negatively surprised with the amount of money CC costs. Just a simple refactoring cost me about 5min + 15min review and 4usd, had I done it myself it might have taken 15-20min as well.
How much money do you typically spend on features using CC? Nobody seems to mention this
Due to oversupply. First you needed humans who can code. But now you need scalable compute.
Equivalent would be hiring those people to wave a flag infront of a car. They are replaced bt modern cars, but you dont get to receive the flag wavers wage as captured value for long if at all.
https://support.anthropic.com/en/articles/11145838-using-cla...
That being said, Claude Code produces the best code I've seen from an AI coding agent, and the subscriptions are a steal.
Could you spend that 15-20min on some other task while this one works in the background?
You can get an extra 15-20% out of it if you also document the parts of the codebase you expect to change first. Let the plan model document how it works, architecture and patterns. Then plan your feature with this in the context. You'll get better code out of it.
Also, make sure you review, revise and/or hand edit the docs and plans too. That pays significant dividends down the line.
So; I have Gemini write up plans for something, having it go deep and be as explicit as possible in its explanations.
I feed this into CC and have it implement the change in my code base. This has been really strong for me in making new features or expanding upon others where I feel something should be considerably improved.
The product I’ve built from the ground up over the last 8w is now in production and being actively demoed to clients. I am beyond thrilled with my experience and its output. As I’ve mentioned before on HN, we could have done much of this work ourselves with our existing staff, but we could not have done the front end work. What I feel might have taken well over a year and way more engineering and data science effort was mostly done in 2m. Features are added in seconds rather than hours.
I’m amazed by CC and I love reading these articles which help me to realize my own journey is being mirrored by others.
With OpenAI I find ChatGPT just slows to a crawl and the chat becomes unresponsive. Asking it to make a document, to import into a new chat, helps with that.
On a human level, it makes me think that we should do the same ourselves. Reflect, document and dump our ‘memory’ into a working design doc. To free up ourselves, as well as our LLMs.
Need sales forecasting? This used to be an enterprise feature that 10 years ago would have needed a large team to implement correctly. Claude implements a docker container in one afternoon.
It really changes how I see software now. Before there were NDAs and intellectual property and companies too great care not to leak their source code.
Now things have changed, have a complex ERP system that took 20 years to develop? Well, claude can re-implement it in a flash. And write documentation and tests for it. Maybe it doesn't work quite that well yet, but things are moving fast.
doing it for the LLM really highlights that limitation. they arent trained statefully, not at the foundation model, where it matters. that state gets reproduced on top of the model in the form of "reasoning" and "chain of thought" but that level of scaffolding is a classic example of the bitter lesson. like semantic trees of old.
the representation learning + transformer model needs to be evolved to handle state, then it should be able to do these things itself
No comments yet
Assumptions without evaluation are not trustworthy.
But since then I have come to have it always write ARCHITECTURE.md and IMPLEMENTATION.md when doing a new feature and CLAUDE-CONTINUE.md. All three live in the resp. folder of the feature (in my case, it's often a new crate or a new module, as I write Rust).
The architecture one is usually the result of some back and forth with CC much like the author describes it. Once that is nailed it writes that out and also the implementation. These are not static ofc, they may get updated during the process but the longer you spend discussing with CC and thinking about what you're doing the less likely this is necessary. Really no surprise there -- this works the same way in meat space. :]
I have an instruction in CLAUDE.md that it should update CLAUDE-CONTINUE.md with the current status, referencing both the other documents, when the context is 2% away from getting compacted.
After the compact it reads the resp. CLAUDE-CONTINUE.md (automatically, since it's referenced in CLAUDE.md) and then usually continues as if nothing happened. Without this my mileage varied as it needs to often read a lot of code (again) first and calibrated to what parts of architecture and implementation it did, before the compact.
I often also have it write out stuff that is needed in dependencies that I maintain or that are part of the project so then it creates ARCHITECTURE-<feature>-<crate>.md and I just copy that over to the resp. repo and tell another CC instance there to write the implementation document and send it off.
A lot of stuff I do is done via Terry [1] and this approach has worked a treat for me. Shout out to these guys, they rock.
Edit: P.S. I have 30+ years R&D experience in my field so I have deep understanding of what I do (computer graphics, system programming, mostly). I have quite a few friends with a decade or less of R&D experience and they struggle to get the same amount of shit done with CC or Ai.
The models are not there yet, you need the experience. I also mainly formulate concisely what I want and what the API should look and the go back and forth with CC, not start with a fuzzy few sentences and cross my fingers that what it comes up with is something I may like and can then mold a tad.
I also found that not getting weird bugs that the model may chase for several "loops" seem correlated with the amount of statically-typed code. I.e. I've been recently working on a Python code base that interfaces with Rust and the number of times CC shot itself in the foot because it assumed a foo was a [foo] and stuff like that is just astounding. This obviously doesn't happen in Rust, the language/compiler catches it and the model 'knows' it can't get away with it so it seems to exercises more rigor (but I may be 'hallucinating' that).
TLDR; I came to the conclusion that statically-typed languages get you higher returns with these models for this reason.
[1] https://www.terragonlabs.com/
Details what algorithms/approaches to use etc. The reason is that often a single context is not enough and when CC continues the CLAUDE-CONTINUE tells the model What it should do. Not Why (architecture) and How (implementation).
The architecture file is usually more abstract/high level and may also contain info about how the stuff integrates with other parts of the codebase etc.
I have something similar to that where all I do is list out the key types, structs, enums and traits, accompanied by comments describing what they are. I broke it down into four sections corresponding to different layers of abstraction.
But I noticed that over time the LLM will puff up the size and start putting implementations into it, so some prompting discipline is required to keep things terse and inline.
Is your ARCHITECTURE.md similar to mine or is it more like a UML diagram or perhaps an architectural spec in a DSL?