One weird trick is to tell the LLM to ask you questions about anything that’s unclear at this point. I tell it eg to ask up to 10 questions. Often I do multiple rounds of these Q&A and I‘m always surprised at the quality of the questions (w/ Opus). Getting better results that way, just because it reduces the degrees of freedom in which the agent can go off in a totally wrong direction.
bbarnett · 1h ago
Oh great.
LLM -> I've read 1000x stack overflow posts on this. The way coding works, is I produce sub-standard code, and then show it to others on stackoverflow! Others chime in with fixes!
You -> Get the LLM to simulate this process, by asking to to post its broken code, then asking for "help" on "stackoverflow" (eg, the questions it asks), and then after pasting the fix responses.
Hands down, you've discovered why LLM code is so junky all the time. Every time it's seen code on SO and other places, it's been "Here's my broken code" and then Q&A followed by final code. Statistically, symbolically, that's how (from an LLM perspective) coding tends to work.
Because of course many code examples it's seen are derived from this process.
So just go through the simulated exchange, and success.
And the best part is, you get to go through this process every time, to get the final fixed code.
manmal · 1h ago
The questions it asks are usually domain specific and pertaining to the problem, like modeling or „where do I get this data from ideally“.
bbarnett · 53m ago
Not blaming you, it's actually genius. You're simulating what it's seen, and therefore getting the end result -- peer discussed and reviewed SO code.
deadbabe · 1h ago
This is a little anthropomorphic. The faster option is to tell it to give you the full content of an ideal context for what you’re doing and adjust or expand as necessary. Less back and forth.
manmal · 1h ago
Can you give me the full content of the ideal context of what you mean here?
rzzzt · 1h ago
Certainly!
pmxi · 2h ago
> If you are a heavy user, you should use pay-as-you go pricing
if you’re a heavy user you should pay for a monthly subscription for Claude Code which is significantly cheaper than API costs.
ramesh31 · 2h ago
Am I alone in spending $1k+/month on tokens? It feels like the most useful dollars i've ever spent in my life. The software I've been able to build on a whim over the last 6 months is beyond my wildest dreams from a a year or two ago.
fainpul · 2h ago
> The software I've been able to build on a whim over the last 6 months is beyond my wildest dreams from a a year or two ago.
If you don't mind sharing, I'm really curious - what kind of things do you build and what is your skillset?
OtherShrezzing · 49m ago
I’m unclear how you’re hitting $1k/mo in personal usage. GitHub Copilot charges $0.04 per task with a frontier model in agent mode - and it’s considered expensive. That’s 850 coding tasks per day for $1k/mo, or around 1 per minute in a 16hr day.
I’m not sure a single human could audit & review the output of $1k/mo in tokens from frontier models at the current market rate. I’m not sure they could even audit half that.
sothatsit · 40m ago
You're not alone in using $1k+/month in tokens. But if you are spending that much, you should definitely be on something like Anthropic's Max plan instead of going full API, since it is a fraction of the cost.
kergonath · 58m ago
> Am I alone in spending $1k+/month on tokens?
I would if there were any positive ROI for these $12k/year, or if it were a small enough fraction of my income. For me, neither are true, so I don’t :).
Like the siblings I would be interested in having your perspective on what kind of thing you do with so many tokens.
mewpmewp2 · 43m ago
If freelancing and if I am doing 2x as much as previously with same time, it would make sense that I am able to make 2x as much. But honestly to me with many projects I feel like I was able to scale my output far more than 2x. It is a different story of course if you have main job only. But I have been doing main job and freelancing on the side forever now.
I do freelancing mostly for fun though, picking projects I like, not directly for the money, but this is where I definitely see multiples of difference on what you can charge.
zppln · 2h ago
Care to show what you've built?
tovej · 1h ago
I would personally never. Do I want to spend all my time reviewing AI code instead of writing? Not really. I also don't like having a worse mental model of the software.
What kind of software are you building that you couldn't before?
alex-moon · 1h ago
As a human dev, can I humbly ask you to separate out your LLM "readme" from your human README.md? If I see a README.md in a directory I assume that means the directory is a separate module that can be split out into a separate repo or indeed storage elsewhere. If you're putting copy in your codebase that's instructions for a bot, that isn't a README.md. By all means come up with a new convention e.g. BOTS.md for this. As a human dev I know I can safely ignore such a file unless I am working with a bot.
kergonath · 1h ago
I think things are moving towards using AGENTS.md files: https://agents.md/ . I’d like something like this to become the consensus for most commonly used tools at some point.
While I agree keep Readme a for humans, Readme literally means read me.
Not 'this is a separate project'. Not 'project documentation file'.
You can have read mes dotted all over a project if that's necessary.
It's simply a file that a previous developer is asking you to read before you start making around in that directory.
sothatsit · 35m ago
> If you are a heavy user, you should use pay-as-you go pricing; TANSTAAFL.
This is very very wrong. Anthropic's Max plan is like 10% of the cost of paying for tokens directly if you are a heavy user. And if you still hit the rate-limits, Claude Code can roll-over into you paying for tokens through API credits. Although, I have never hit the rate limits since I upgraded to the $200/month plan.
I’ve seen going very successfully using both codex with gpt5 and claude code with opus. You develop a solution with one, then validate it with the other. I’ve fixed many bugs by passing the context between them saying something like: “my other colleague suggested that…”.
Bonus thing: I’ve started using symlinks on CLAUDE.md files pointing at AGENTS.md, now I don’t even have to maintain two different context files.
CuriouslyC · 4h ago
If I paid for my API usage directly instead of the plan it'd be like a second mortgage.
3abiton · 2h ago
To be fair, allocating some token for planning (recursively) helps a lot. It requires more hands on work, but produce much better results. Clarifying the tasks and breaking them down is very helpful too. Just you end up spending lots of time on it. On the bright side, Qwen3 30B is quite decent, and best of all "free".
athrowaway3z · 2h ago
> One of the weird things I found out about agents is that they actually give up on fixing test failures and just disable tests. They’ll try once or twice and then give up.
Its important to not think in terms of generalities like this. How they approach this depends on your tests framework, and even on the language you use. If disabling tests is easy and common in that language / framework, its more likely to do it.
For testing a cli, i currently use run_tests.sh and never once has it tried to disable a test. Though that can be its own problem when it hits 1 it can't debug.
# run_tests.sh
# Handle multiple script arguments or default to all .sh files
Another tip. For a specific tasks don't bother with "please read file x.md", Claude Code (and others) accept the @file syntax which puts that into context right away.
efitz · 8h ago
I spent much of the last several months using LLM agents to create software. I've written two blog posts about my experience; this is the second post that includes all the things I've learned along the way to get better results, or at least waste less money.
afeezaziz · 5h ago
you should write more about your experience using LLM. Is this solely using LLM?
xwowsersx · 5h ago
This lines up with my own experience of learning how to succeed with LLMs. What really makes them work isn't so different from what leads to success in any setting: being careful up front, measuring twice and cutting once.
LLM -> I've read 1000x stack overflow posts on this. The way coding works, is I produce sub-standard code, and then show it to others on stackoverflow! Others chime in with fixes!
You -> Get the LLM to simulate this process, by asking to to post its broken code, then asking for "help" on "stackoverflow" (eg, the questions it asks), and then after pasting the fix responses.
Hands down, you've discovered why LLM code is so junky all the time. Every time it's seen code on SO and other places, it's been "Here's my broken code" and then Q&A followed by final code. Statistically, symbolically, that's how (from an LLM perspective) coding tends to work.
Because of course many code examples it's seen are derived from this process.
So just go through the simulated exchange, and success.
And the best part is, you get to go through this process every time, to get the final fixed code.
if you’re a heavy user you should pay for a monthly subscription for Claude Code which is significantly cheaper than API costs.
If you don't mind sharing, I'm really curious - what kind of things do you build and what is your skillset?
I’m not sure a single human could audit & review the output of $1k/mo in tokens from frontier models at the current market rate. I’m not sure they could even audit half that.
I would if there were any positive ROI for these $12k/year, or if it were a small enough fraction of my income. For me, neither are true, so I don’t :).
Like the siblings I would be interested in having your perspective on what kind of thing you do with so many tokens.
I do freelancing mostly for fun though, picking projects I like, not directly for the money, but this is where I definitely see multiples of difference on what you can charge.
What kind of software are you building that you couldn't before?
There was a discussion here 3 days ago: https://news.ycombinator.com/item?id=44957443 .
Not 'this is a separate project'. Not 'project documentation file'.
You can have read mes dotted all over a project if that's necessary.
It's simply a file that a previous developer is asking you to read before you start making around in that directory.
This is very very wrong. Anthropic's Max plan is like 10% of the cost of paying for tokens directly if you are a heavy user. And if you still hit the rate-limits, Claude Code can roll-over into you paying for tokens through API credits. Although, I have never hit the rate limits since I upgraded to the $200/month plan.
Its important to not think in terms of generalities like this. How they approach this depends on your tests framework, and even on the language you use. If disabling tests is easy and common in that language / framework, its more likely to do it.
For testing a cli, i currently use run_tests.sh and never once has it tried to disable a test. Though that can be its own problem when it hits 1 it can't debug.
# run_tests.sh # Handle multiple script arguments or default to all .sh files
scripts=("${@/#/./examples/}")
[ $# -eq 0 ] && scripts=(./examples/*.sh)
for script in "${scripts[@]}"; do
doneecho " OK"
----
Another tip. For a specific tasks don't bother with "please read file x.md", Claude Code (and others) accept the @file syntax which puts that into context right away.