Show HN: I built a Chrome extension that makes bug reporting dead simple (chromewebstore.google.com)

I don't understand the productivity that people get out of these AI tools. I've tried it and I just can't get anything remotely worthwhile unless it's something very simple or something completely new being built from the ground up.

Like sure, I can ask claude to give me the barebones of a web service that does some simple task. Or a webpage with some information on it.

But any time I've tried to get AI services to help with bugfixing/feature development on a large, complex, potentially multi-language codebase, it's useless.

And those tasks are the ones that actually take up the majority of my time. On the occasion that I'm spinning a new thing up quickly, I don't really need an AI to do it for me -- I mean, that's the easy part!

Is there something I'm missing? Am I just not using it right? I keep seeing people talk about how addictive it is, how the productivity boost is insane, how all their code is now written by AI and then audited, and I just don't see how that's possible outside of really simple rote programming.

tptacek · 32d ago

The first and most important question to ask here is: are you using a coding agent? A lot of times, people who aren't getting much out of LLM-assisted coding are just asking Claude or GPT for code snippets, and pasting and building them themselves (or, equivalently, they're using LLM-augmented autocomplete in their editor).

Almost everybody doing serious work with LLMs is using an agent, which means that the LLM is authoring files, linting them, compiling them, and iterating when it spots problems.

There's more to using LLMs well than this, but this is the high-order bit.

lexandstuff · 32d ago

Funny, I would give the absolute opposite advice. In my experience, the use of agents (mainly Cursor) is a sure-fire way to have a really painful experience with LLM-assisted coding. I much prefer to use AI as a pair programmer, that I talk to and sometimes get to write entire files, but I'm always the one doing the driving, and mostly the one writing the code.

If you aren't building up mental models of the problem as you go, you end up in a situation where the LLM gets stuck at the edges of its capability, and you have no idea how even to help it overcome the hurdle. Then you spend hours backtracking through what it's done building up the mental model you need, before you can move on. The process is slower and more frustrating than not using AI in the first place.

I guess the reality is, your luck with AI-assisted coding really comes down to the problem you're working on, and how much of it is prior art the LLM has seen in training.

tptacek · 32d ago

I mean, it might depend, but many of the most common complaints about LLM coding (most notably hallucination) are essentially solved problems if you're using agents. Whatever works for you! I don't even like autocomplete, so I sympathize with not liking agents.

If it helps, for context: I'll go round and round with an agent until I've got roughly what I want, and then I go through and beat everything into my own idiom. I don't push code I don't understand and most of the code gets moved or reworked a bit. I don't expect good structure from LLMs (but I also don't invest the time to improve structure until I've done a bunch of edit/compile/test cycles).

I think of LLMs mostly as a way of unsticking and overcoming inertia (and writing tests). "Writing code", once I'm in flow, has always been pleasant and fast; the LLMs just get me to that state much faster.

I'm sure training data matters, but I think static typing and language tooling matters much more. By way of example: I routinely use LLMs to extend intensely domain-specific code internal to our project.

disgruntledphd2 · 31d ago

> but many of the most common complaints about LLM coding (most notably hallucination) are essentially solved problems if you're using agents.

I have definitely not seen this in my experience (with Aider, Claude and Gemini). While helping me debug an issue, Gemini added a !/bin/sh line to the middle of the file (which appeared to break things), and despite having that code in the context didn't realise it was the issue.

OTOH, when asking for debugging advice in a chat window, I tend to get more useful answers, as opposed to a half-baked implementation that breaks other things. YMMV, as always.

overfeed · 32d ago

> ...many of the most common complaints about LLM coding (most notably hallucination) are essentially solved problems if you're using agents

Inconsistency and crap code quality aren't solved yet, and these make the agent workflow worse because the human only gets to nudge the AI in the right direction very late in the game. The alternative, interactive, non-agentic workflows allow for more AI-hand-holding early, and better code quality, IMO.

Agents are fine if no human is going to work on the (sub)system going forward, and you only care about the shiny exterior without opening the hood to witness the horrors within.

tom_m · 31d ago

Cursor is pretty bad in my experience. I don't know why because I find Windsurf better and they both use Claude.

Regardless, Gemini 2.5 Pro is far far better and I use that with open-source free Roo Code. You can use the Gemini 2.5 Pro experimental model for free (rate limited) to get a completely free experience and taste for it.

Cursor was great and started is off, but others took notice and now they're all more or less the same. It comes down to UX and preference, but I think Windsurf and Roo Code just did a better job here than Cursor, personally.

mnoronha · 32d ago

Agree. My favorite workflow has been chatting with the LLM in the assistant panel of Zed, then making inline edits by prompting the AI with the context of that chat. That way, I can align with the AI on how the problem should be solved before letting it loose. What's great about this depending on how easy or hard the problem is for the LLM, I can shift between handholding / manual coding and vibe coding.

theshrike79 · 32d ago

Agents make it easier for you to give context to the LLM, or for it to grab some by itself like Cline/Claude/Cursor/Windsurf can do.

With a web-based system you need repomix or something similar to give the whole project (or parts of it if you can be bothered to filter) as context, which isn't exactly nifty

WD-42 · 32d ago

It’s because the people doing rote programming with them don’t think they are doing rote programming, they think it’s exceptional.

khazhoux · 32d ago

My sweet spot is Cursor to generate/tweak code, but I do all the execution and debugging iteration myself.

__mharrison__ · 32d ago

What agent do you recommend?

tptacek · 32d ago

I think they're all fine. Cursor is popular and charges a flat fee for model calls (interposed through their model call router, however that works). Aider is probably the most popular open source command line one. Claude Code is probably the most popular command line agent overall; Codex is the OpenAI equivalent (I like Codex fine).

later

Oh, I like Zed a lot too. People complain that Zed's agent (the back-and-forth with the model) is noticeably slower than the other agents, but to me, it doesn't matter: all the agents are slow enough that I can't sit there and wait for them to finish, and Zed has nice desktop notifications for when the agent finishes.

Plus you get a pretty nice editor --- I still write exclusively in Emacs, but I think of Zed as being a particularly nice code UI for an LLM agent.

theshrike79 · 32d ago

I've settled on Cline for now, with openrouter as the backend for LLMs, Gemini 2.5 for planning and Claude 3.7 for act mode.

Cursor is fine, Claude Code and Aider are a bit too janky for me - and tend to go overboard (making full-ass git commits without prompting) and I can't be arsed to rein them in.

physix · 32d ago

I use Augment Code as a plugin in IntelliJ and PyCharm. It's quite good, but I only use it for narrow, targeted objectives, agent mode or not.

I haven't seen any mentions of Augment code yet in comment threads on HN. Does anyone else use Augment Code?

kbaker · 32d ago

Try https://aider.chat + OpenRouter.ai, pay-as-you-go, use any model you want, I use Claude Sonnet.

It has a very good system prompt so the code is pretty good without a lot of fluff.

haiku2077 · 32d ago

I've been having okayish results with Zed + Claude 3.7

kasey_junk · 32d ago

Speaking up for Devin.ai here. What I like about it is that after the initial prompt nearly all of my interaction with it is via pull request comments.

I have this workflow where I trigger a bunch of prompts in the morning, lunch and at the end of the day. At those same times I give it feedback. The async nature really means I can have it work on things I can’t be bothered with myself.

tptacek · 32d ago

I need to know more about the morning/lunch/evening prompts, and I need to know right now. What are they? This sounds amazing.

kasey_junk · 32d ago

Oh they aren’t like time based instructions or anything. First thing I do when I sit down in the morning is go through the list of tasks I thought up overnight and fire devin at them. Then I go do whatever “real” work I needed to get done. Then at lunch I check in to see how things are going and give feedback or new tasks. Same as the last thing I do at night.

It keeps _me_ from context switching into agent manager mode. I do the same thing for doing code reviews for human teammates as well.

tptacek · 32d ago

Right, no, I figured that! Like the idea of preloading a bunch of things into a model that I don't have the bandwidth to sort through, but having them on tap when I come up for air from whatever I'm currently working on, sounds like a super good trick.

kasey_junk · 32d ago

That’s kind of where Devin excels. The agent itself is good enough, I don’t even know what model it uses. But it’s hosted and well integrated with GitHub, so you just give it a prompt and out shoots a pr sometime later. You comment on the pr and it refines it. It has a concept of “sessions” so you can start many of those tasks at once. You can login to each of its tasks and see what it is doing or interdict, but I rarely do.

Like most of the code agents it works best with tight testable loops. But it has a concept of short vs long tests and will give you plans as nd confidence values to help you refine your prompt if you want.

I tend to just let it go. If it gets to a 75% done spot that isn’t worth more back and forth I grab the pr and finish it off.

lukan · 32d ago

Yesterday I gave cursor a try and made my first (intentionally very lazy) vibe coding approach (a simple threejs project). It accepted the task and did things, failed, did things, failed, did things ... failed for good.

I guess I could work on the magic incantations to tweak here and there a bit until it works and I guess that's the way it is done. But I wasn't hooked.

I do get value out of LLM's for isolated broken down subtasks, where asking a LLM is quicker than googling.

For me, AI will probably become really usefull, once I can scan and integrate my own complex codebase so it gives me solutions that work there and not hallucinate API points or jump between incompatible libary versions (my main issue).

rockemsockem · 32d ago

I did almost the same thing and had pretty much the same experience. A lot of times it felt so close to being great, but it ultimately wasted more time than if I had just worked on the project and occasionally asked chat GPT to generate some sample code to figure out an API

christophilus · 32d ago

I’ve had the same experience with Cursor. Claude Code, though, has been a game changer. It is really excellent.

tauoverpi · 32d ago

I've had the same issue every time I've tried it. The code I generally work on is embedded C/C++ with in-house libraries where the tools are less than useful as they try to generate non-existant interfaces and generally generate worse code than I'd write by hand. There's a need for correctness and being able to explain the code thus use of those tools is also detrimental to explainability unless I hand-hold it to the point where I'm writing all of the code myself.

Generating function documentation hasn't been that useful either as the doc comments generated offer no insight and often the amount I'd have to write to get it to produce anything of value is more effort than just writing the doc comments myself.

For my personal project in zig they either get lost completely or gives me terrible code (my code isn't _that_ bad!). There seems to be no middle ground here. I've even tried the tools as pair programmers but they often get lost or stuck in loops of repeating the same thing that's already been mentioned (likely falls out of the context window).

When it comes to others using such tools I've had to ask them to stop using it to think as it becomes next to impossible to teach / mentor if they're passing that I say to the LLM or trying to have it perform the work. I'm confident in debugging people when it comes to math / programming but with an LLM between it's just not possible to guess where they went wrong or how to bring them back to the right path as the throught process is lost (or there wasn't one to begin with).

This is not even "vibe coding", I've just never found it generally useful enough to use day-to-day for any task and my primary use of say phind has been to use it as an alternative to qwant when I cannot game the search query well enough to get the search results I'm looking for (i.e I ignore the LLM output and just look at the references).

motorest · 32d ago

> I've had the same issue every time I've tried it. The code I generally work on is embedded C/C++ with in-house libraries where the tools are less than useful as they try to generate non-existant interfaces and generally generate worse code than I'd write by hand.

That's because whatever training the model had, it didn't covered anything remotely similar to the codebase you worked on.

We get this issue even with obscure FLOSS libraries.

When we fail to provide context to LLMs, they generate examples by following supperficial queues like coding conventions. In extreme cases, such as code that employs source code generators or templates, LLMs even fill in function bodies that code generators are designed to generate for you. That's because, if LLMs are oblivious to the context, they resort to hallucinate their way into something seemingly coherent. Unless you provide them with context or instruct them not to make up stuff, they will resort to bullshit their way into an example.

What's truly impressive about this is that often times the hallucinated code actually works.

> Generating function documentation hasn't been that useful either as the doc comments generated offer no insight and often the amount I'd have to write to get it to produce anything of value is more effort than just writing the doc comments myself.

Again,this suggest a failure on your side for not providing any context.

If you give it enough context LLMs synthesize and present them almost instantly. If you're prompting a LLM to generate documentation, which boils down to synthesizing what an implementation does and what's their purpose,and the LLM comes up empty, that means you failed to give it anything to work on.

The bulk of your comment screams failure to provide any context. If your code steers far away from what it expects, fails to follow any discernible structure, and doesn't even convey purpose and meaning in little things like naming conventions, you're not giving the LLM anything to work on.

tauoverpi · 32d ago

I'm aware of the _why_ but this is why the tools aren't useful for my case. If they cannot consume the codebase in a reasonable amount of time and provide value from that then they generally aren't useful in areas where I would want to use them (navigating large codebases). If the codebase is relatively small or the problem is known then an LLM is not any better than tab-complete and arguably worse in many cases as the generated result has to be parsed and added to my mental model of the problem rather than the mental model being constructed while working on the code itself.

I guess my point is, I have no use for LLMs in their current state.

> That's because whatever training the model had, it didn't covered anything remotely similar to the codebase you worked on. > We get this issue even with obscure FLOSS libraries.

This is the issue however as unfamiliar codebases is exactly where I'd want to use such tooling. Not working in those cases makes it less than useful.

> Unless you provide them with context or instruct them not to make up stuff, they will resort to bullshit their way into an example.

In all cases context was provided extensively but at some point it's easier to just write the code directly. The context is in surrounding code which if the tool cannot pick up on that when combined with direction is again less than useful.

> What's truly impressive about this is that often times the hallucinated code actually works.

I haven't experienced the same. It fails more often than not and the result is much worse than the hand-written solution regardless of the level of direction. This may be due to unfamiliar code but again, if code is common then I'm likely familiar with it already thus lowering the value of the tool.

> Again,this suggest a failure on your side for not providing any context.

This feels like a case of blaming the user without full context of the situation. There are comments, the names are descriptive and within reason, and there's annotation of why certain things are done the way they are. The purpose of a doc comment is not "this does X" but rather _why_ you want to use this function and it's purpose which is something LLMs struggle to derive from my testing of them. Adding enough direction to describe such is effectively writing the documentation with a crude english->english compiler between. This is the same problem with unit test generation where unit tests are not to game code coverage but to provide meaningful tests of the domain and known edge cases of a function which is again something the LLM struggles with.

For any non-junior task LLM tools are practically useless (from what I've tested) and for junior level tasks it would be better to train someone to do better.

motorest · 31d ago

> I'm aware of the _why_ but this is why the tools aren't useful for my case. If they cannot consume the codebase in a reasonable amount of time and provide value from that then they generally aren't useful in areas where I would want to use them (navigating large codebases).

I challenge you to explore different perspectives.

You are faced with a service that handles any codebase that's thrown at it with incredible ease, without requiring any tweaking or special prompting.

For some reason, the same system fails to handle your personal codebase.

What's the root cause? Does it lie in the system that works everywhere with anything you throw at it? Or is it in your codebase?

tauoverpi · 22d ago

Well, what would I have to do to please the LLM? Writing code isn't for LLMs to consume but rather to communicate intent for people and for machines to run which provides value for the user at the end of the day. If an LLM fails at being useful within a codebase when it's supposed to be a "works anywhere" tool then the tool is less than useful.

Note that language servers, static analysis tooling, and so on still work without issue.

The cause (which is my assumption) is that there aren't enough good examples in the training set for anything useful to be the most likely continuation thus leading to a suboptimal result given the domain. Thus the tool doesn't work "everywhere" for cases where there's less use of a language or less code in general dealing with a particular problem.

Starlevel004 · 32d ago

> Is there something I'm missing? Am I just not using it right?

The talk about it makes more sense when you remember most developers are primarily writing CRUD webapps or adware, which is essentially a solved problem already.

tptacek · 32d ago

I'm not doing either of those things with it.

rockemsockem · 32d ago

What are some examples of things you are doing with it?

tptacek · 32d ago

Large scale server telemetry and fiddly OAuth2/Macaroon token management.

guyfhuo · 32d ago

This seems a self servingly literal interpretation of the op’s original comment.

Clearly something like “server telemetry” is the datacenter’s “CRUD app” analogue.

It’s a solved problem that largely requires rtfm and rote execution of well worn patterns in code structure.

Please stick to the comment guidelines:

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

https://news.ycombinator.com/newsguidelines.html

sshine · 31d ago

Just to say: setting up oauth and telemetry are harder than building CRUD, because it comes later in the learning progression. But it’s still mostly just following a recipe; LLMs are good at finding recipes that are uncommon, but available in the dataset. They break more when you go off script and you don’t take tiny steps.

tptacek · 31d ago

I'm laughing at the idea of server telemetry being "the data center equivalent of a CRUD app". I'll just pull out my Telemetry On Rails framework and run some generators...

guyfhuo · 31d ago

I’d rather write server telemetry logic than any customer facing ui app, and it sounds like you would too.

tptacek · 31d ago

I'd rather write a new Device Mapper target than do either of those things, and LLMs have been helpful with that too. Is a Device Mapper target the "CRUD app of the Linux kernel"?

guyfhuo · 30d ago

> I'd rather write a new Device Mapper target than do either of those things

Perhaps it’s time for a career change then. Follow your joy and it will come more naturally for you to want to spread it.

Again,

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

From my reading the “strongest possible interpretation” of the original “CRUD app” line was “it’s a solved problem that largely requires rtfm and rote execution of well worn patterns in code structure” making it similarly situated as “server telemetry” to make llms appear superintelligent to people new to programming within those paradigms.

I’m unfamiliar with “device mapping”, so perhaps someone else can confirm if it is “the crud app of Linux kernel dev” in that vein.

Just listing topics in software development is hardly evidence of either your own ability to work on them, or of their inherent complexity.

Since this seems to have hurt your feelings, perhaps a more effective way to communicate your needs would be to explain why you find “server telemetry” to be more difficult/complex/w/e to warrant needing an llm for you to be able to do it.

tptacek · 30d ago

I think you're off on your own little thing here and I wish you the best of luck with it, but I don't think I can be helpful for you.

int_19h · 31d ago

I work on a VSCode debugger extension that uses gdb as a backend, and I've had some success tackling bugs in that using Cursor + Gemini 2.5 Pro. The codebase is in TypeScript.

Based from what I've seen, Python and TypeScript are where it fares best. Other languages are much more hit and miss.

tom_m · 31d ago

They aren't increasing productivity. In the short term.

They are very handy tools that can help you learn a foreign code/base faster. They can help you when you run into those annoying blockers that usually take hours or days or a second set of eyes to figure out. They give you a sounding board and help you ask questions and think about the code more.

Big IF here. IF you bother to read. The danger is some people just keep clicking and re-prompting until something works, but they have zero clue what it is and how it works. This is going to be the biggest problem with AI code editors. People just letting Jesus take the wheel and during this process, inefficient usage of the tools will lead to slower throughput and a higher bill. AI costs a good chunk of change per token and that's only going up.

I do think it's addictive for sure. I also think the "productivity boost" is a feeling people get, but no one measures. I mean, it's hard to measure. Then again, if you do spend an hour on a problem you get stuck on vs 3 days then sure it helped productivity. In that particular scenario. Averaged out? Who knows.

They are useful tools, they are just also very misunderstood and many people are too lazy to take the time to understand them. They read headlines and unsubstantiated claims and get overwhelmed by hype and FOMO. So here we are. Another tech bubble. A super bubble really. It's not that the tools won't be with us for a long time or that they aren't useful. It's that they are way way overvalued right now.

danbolt · 32d ago

I appreciate you voicing your feelings here. My previous employer requested we try AI tooling for productivity purposes, and I was finding myself in similar scenarios to what you mention. The parts that would have benefitted from a productivity gain weren’t seeing any improvement, while the areas that saw a speedup weren’t terribly mission-critical.

The one thing I really appreciated though was the AI’s ability to do a “fuzzy” search in occasional moments of need. Or, for example, sometimes the colloquial term for a feature didn’t match naming conventions in source code. The AI could find associations in commit messages and review information to save me time rummaging through git-blame. Like I said though, that sort of problem wasn’t necessarily a bottleneck and could often be solved much more cheaply by asking around coworker on Slack.

hx8 · 32d ago

Probably 80% of the time I spend coding, I'm inside a code file I haven't read in the last month. If I need to spend more than 30 seconds reading a section of code before I understand it, I'll ask AI to explain it to me. Usually, it does a good job of explaining code at a level of complexity that would take me 1-15 minutes to understand, but does a poor job of answering more complex questions or at understanding more complex code.

It's a moderately useful tool for me. I suspect the people that get the most use out of are those that would take more than 1 hour to read code I would take 10 minutes to read. Which is to say the least experienced people get the most value.

etler · 32d ago

I find it's incredibly helpful for prototyping. These tools quickly reach a limit of complexity and put out sub par code, but for a green field prototype that's ok.

I've successfully been able to test out new libraries and do explorations quickly with AI coding tools and I can then take those working examples and fix them up manually to bring them up to my coding standards. I can also extend the lifespan of coding tools by doing cleanup cycles where I manually clean up the code since they work better with cleaner encapsulation, and you can use them to work on one scoped component at a time.

I've found that they're great to test out ideas and learn more quickly, but my goal is to better understand the technologies I'm prototyping myself, I'm not trying to get it to output production quality code.

I do think there's a future where LLMs can operate in a well architected production codebase with proper type safe compilation, linting, testing, encapsulation, code review, etc, but with a very tight leash because without oversight and quality control and correction it'll quickly degrade your codebase.

gdubs · 31d ago

Incredibly useful for 'glue code' or internal apps that are for automating really annoying processes - but where normally the time it would take to develop those tools would add up and take away from the core work.

For instance, dealing with files that don't quite work correctly between two 3D applications because of slightly different implementations. Ask for a python script to patch the files so that they work correctly – done almost instantly just by describing the problem.

Also for prototyping. Before you spend a month crafting a beautiful codebase, just get something standing up so you can evaluate whether it's worth spending time on – like, does the idea have legs?

90% of programming problems get solved with a rubber ducky – and this is another valuable area. Even if the AI isn't correct, often times just talking it through with an LLM will get you to see what the solution is.

jiggawatts · 32d ago

I’ve had good experiences using it, but with the caveat that only Gemini Pro 2.5 has been at all useful, and only for “spot” tasks.

I typically use it to whip up a CLI tool or script to do something that would have been too fiddly otherwise.

While sitting in a Teams meeting I got it to use the Roslyn compiler SDK in a CLI tool that stripped a very repetitive pattern from a code base. Some OCD person had repeated the same nonsense many thousands of times. The tool cleaned up the mess in seconds.

_345 · 32d ago

>Some OCD person had repeated the same nonsense many thousands of times. The tool cleaned up the mess in seconds.

What were they doing?

jiggawatts · 32d ago

try { ... } catch (Exception e) { throw e; }

That does nothing except add visual noise.

It's like a magic incantation to make the errors go away (it doesn't actually), probably by someone used to Visual Basic's "ON ERROR RESUME NEXT" or some such.

slurpyb · 32d ago

You are not alone! I strongly agree and I feel like I am losing my mind reading some of the comments people have about these services.

jckahn · 32d ago

For me it's an additional tool, not the only tool. LSPs are still better for half of the things I do daily (renaming things, extracting things, finding symbols, etc.). I can't imagine using AI for everything and even meeting my current velocity.

SchemaLoad · 32d ago

I swear the people posting this stuff are just building todo list tier apps from scratch. On even the most moderately large codebase it completely breaks down.

protocolture · 32d ago

>large, complex, potentially multi-language

I find that token memory size limits are the main barrier here.

Once the LLM starts forgetting other parts of the application, all bets are off and it will hallucinate the dumbest shit, or even just remove features wholesale.

bandoti · 32d ago

For my workflow a lot of the benefit is in the smaller tasks. For example, get a medium size diff from two change-sets, paste it in, and ask to summarize the “what” and “why”. You have to learn how to give AI the right amount of context to get the right result back.

For code changes I prefer to paste a single function in, or a small file, or error output from a compile failure. It’s pretty good at helping you narrow things down.

So, for me, it’s a pile of small gains where the value is—because ultimately I know what I generally want to get done and it helps me get there.

theshrike79 · 32d ago

Just yesterday I wanted to adjust the Renovate setup in a repo.

I could spend 5-10 minutes digging on through the docs for the correct config option, or I can just tap a hotkey, open up GitHub Copilot in Rider and tell it what I want to achieve.

And within seconds it had a correct-looking setting ready to insert to my renovate.json file. I added it, tested it and it works.

I kinda think people who diss AIs are prompting something like "build me Facebook" and then being disappointed when it doesn't :D

colechristensen · 32d ago

Some people do really repetitive or really boilerplate things, others do not.

Also you have to learn to talk to it and how to ask it things.

__loam · 32d ago

"The interface is natural language"

"The programming language of the future will be English"

---

"Well are you using it right? You have to know how to use it"

andy99 · 32d ago

I wish more had been written about the first assertion that using an LLM to code is like gambling and you're always hoping that just one more prompt will get you what you want.

It really captures how little control one has over the process, while simultaneously having the illusion of control.

I don't really believe that code is being made verbose to make more profits. There's probably some element of model providers not prioritizing concise code, but if conciseness while maintaining "quality" was possible is would give one model a sufficient edge over others that I suspect providers would do it.

meander_water · 32d ago

Agreed, I've been thinking about the first assertion a lot recently as I've been using Cursor to create a react app. I think it's more prevalent in frontend development because it tightens the feedback loop considerably, and the more positive feedback you get, the more conditioned you get to reach for it anytime you need to do anything in code.

I think there's another perverse incentive here - organisations want to produce features/products fast, which LLMs help with, but it comes at the cost of reduced cognitive capabilities/skills in the developers over the longer term as they've given that up through lack of use/practice.

tptacek · 32d ago

That's not a great argument for talking down their utility for experienced developers, though.

meander_water · 32d ago

I'm not so sure, I think skills atrophy with disuse no matter what level of experience you have. Like I have around 15 years of experience, but if I stepped away from coding for even just a year a lot of those years of experience will count for nothing.

Rastonbury · 31d ago

I don't believe there are perverse incentives yet, right now it's arms race burn money and operate at a loss days. There is no moat only quality and price per token and the leader moves around too quickly. Also Author should really look into Cursor at $20 with unlimited slow requests, I imagine paying per token hurts when it spits out garbage even when you've thought you provided enough context but it wasn't enough.

Someone needs to make a plugin to count lines of discard code and prompts

techpineapple · 32d ago

Something I caught about Andrej Karpathy’s original tweet, was he said “give into the vibes”, and I wonder if he meant that about outcomes too.

andy99 · 32d ago

I still think the original tweet was tongue-in-cheek and not really meant to be a serious description of how to do things.

nico · 32d ago

> It really captures how little control one has over the process, while simultaneously having the illusion of control.

This is actually a big insight about life, that in some eastern philosophies, you are supposed to arrive to

We love the illusion of control, even though we don’t really have it. Life mostly just unfolds as we experience it

nativeit · 32d ago

This has certainly been my own experience in life. My step-father was a very studious and responsible person. He worked 30-years from the age of 19 with the state as an HVAC service tech until he retired at 49yo with a full state pension, and then went to work for a private company. His plan was to earn as much as he could until he turned 55, and then retire to live/work on the small farm he and my mother had just purchased. Everything was coming together, his new job placed him in a senior project management position, and gave him a considerable salary compared with the state.

Shortly after he turned 50, he was diagnosed with pancreatic cancer, and he died several months later, following a very painful and difficult attempt to treat it.

In my mind, this kind of thing is the height of tragedy—he did everything right. He exhibited an incredible amount of self-control and deferred his happiness, ensuring that his family and finances were well-cared for and secured, and then having fulfilled his obligations, he was almost immediately robbed of a life that he’d worked so hard to earn.

I experienced a few more object lessons in the same vein myself, namely having been diagnosed with multiple sclerosis at the age of 18, and readjusting my life’s goals to accommodate the prospect of disability. I’m thankfully still churning along under my own capacities, now at 41yo, but MS can be unpredictable, and I find it is necessary to remind myself of this from time to time. I am grateful for every day that I have, and to the extent it’s possible, I try to find nearer-term sources for happiness and fulfillment.

Don’t waste any time planning for more than the next five years (with the obvious exceptions for things like financial planning), as you can’t possible know what’s coming. Even if the unexpected event is a happy one, like an unexpected child or sudden financial windfall, your perspective will almost certainly be dramatically altered 1-2x each decade.

nicbou · 31d ago

I've experienced this for the first time with a close friend, and it really stays on your mind. There was no reason it had to be him. He didn't roll the dice wrong.

It created a sense of urgency in my own life. You have this idea that you will be the same person until you die of old age, and suddenly you realise that the current year is worth much more than another year two decades from now. A bird in the hand is worth two in the bush.

theshrike79 · 32d ago

But just like gambling, there are ways to do it correctly.

Yes, there are the grandmas in a trance vibe-gambling by shoving a bucket of quarters in a slot machine.

But you also have people playing Blackjack and beating the averages by knowing how it's played, maybe having a "feel" for the deck (or counting cards...), and most importantly knowing when to fold and walk away.

Same with LLMs, you need to understand context sizes and prompts and you need to have a feel for when the model is just chasing its own tail or trying to force a "solution" just to please the user.

matsemann · 31d ago

While I get your point, this also kinda sounds like a gambling addict trying to explain how they're not an addict and how they're losing money the correct way, heh.

erulabs · 32d ago

These perverse incentives run at the heart of almost all Developer Software as a Service tooling. Using someone else's hosted model incentivizes increasing token usage, but it's nothing special about AI.

Consider Database-as-a-service companies: They're not incentivized to optimize on CPU usage, they charge per cpu. They're not incentivized to improve disk compression, they charge for disk-usage. There are several DB vendors who explicitly disable disk compression and happily charge for storage capacity.

When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.

tmpz22 · 32d ago

> When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.

But my team's time is soooo valuable. It's sooo sooo sooo valuable. Oh and we can't afford to hire anyone else either. But our time its sooo valuable. We need these tools!

alternatex · 31d ago

Opens PR with quadruple-nested for-loop running synchronous DB queries.

- Premature optimization is the root of all evil, can't waste expensive dev hours on that..

jiggawatts · 32d ago

My favourite example of this is the recent trend towards “wide events” replacing logs and metrics… spearheaded and popularised by companies that charge by the gigabytes ingested.

tptacek · 32d ago

Companies that ingest logs generally rip their customers faces off with their pricing. At least oTel spans can be tail-sampled.

jiggawatts · 32d ago

I worked out that it's cheaper to write logs to high-end Samsung SSDs and then throw them away every month than to retain them in the log analytics systems of some cloud services for the same period of time.

Wait, no, sorry... that doesn't quite "paint the right picture".

The "single use" SSDs are 75 times cheaper than storing the data in the cloud.

tptacek · 31d ago

As someone who works at a platform company that operates several (very) large log ingestion systems, if you're not indexing the logs usefully, having stored them on SSDs isn't doing much for you. It's just a weird comparison to make, is all I'm saying.

jiggawatts · 31d ago

Indexing that is 75x the original uncompressed data volume?

Because then I might accept the cost!

Realistically all of these systems use some type of data compression such as Parquet files, so the data on disk is likely smaller than the ingested data.

tptacek · 31d ago

My point is that you're not really just paying for "storage".

jiggawatts · 31d ago

What am I paying for?

I worked out that the markup on CPU, network bandwidth, and storage for the default logging products from the major clouds is on the order of 25x to 500x.

Okay, sure, there's some people that need to be paid, the back-end software may have some licensed components, etc, etc...

But still, comparing this to any other cloud service, the gross profit margin is just ridiculous!

It's the typical IT marketing trick of selling the commodity (VMs) at competitive prices, and then clawing back the profits via the "enterprise add-ons".

tptacek · 30d ago

I mean: we faced the same buy/build issue, and we went "build", and while I wouldn't say it was our worst decision, it was not one of our best either. Certainly the problem doesn't simply boil down to NVME gigabyte costs!

That said: we both agree, log ingestion services are extremely expensive.

chaboud · 32d ago

1. Yes. I've spent several late nights nudging Cline and Claude (and other systems) to the right answers. And being able to use AWS Bedrock to do this has been great (note: I work at Amazon).

2. I've had good fortunes keeping the agents to constrained areas, working on functions, or objects, with clearly defined (by me) boundaries. If the measure of a junior engineer is that you correct them once a day, an engineer once a week, a senior once a month, a principal once a quarter... Treat these agents like hyper-energetic interns. Nudge frequently.

3. Standard org management coding practices apply. Force the agents to show work, plan, unit test, investigate.

And, basically, I've described that we're becoming Software Development Managers with teams of on-demand low-quality interns. That's an incredibly powerful tool, but don't expect hyper-elegant and compact code from them. Keep that for the senior engineering staff (humans) for now.

(Note: The AlphaEvolve announcement makes me wonder if I'm going to have hyper-energetic applied science interns next...)

lubujackson · 32d ago

I feel like "vibe coding" as a "no look" sort of way to produce anything is bad and will probably remain bad for some time.

However... "vibe architecting" is likely going to be the way forward. I have had success with generating/tuning an architecture plan with AI, having it create stub files/functions then filling them out individually. I can get pretty much the whole way without typing code, but it does require a fair bit more architectural thinking than usual and a good bit of reading code (then telling the AI to "do better").

I think of it like the analogy of blind men describing an elephant when they can only feel a single part. AI is decent at high level architecture and decent at low level production but you need a human to understand the big picture and how the pieces fit (and which ones are missing).

nowittyusername · 32d ago

What you are talking about is the "proper" way of vibe coding. Most of the issues with vibe coding stem from user misunderstanding the capabilities of the technology they are using. They are overestimating the capabilities of current systems and are essentially asking for magic to happen. They don't give proper guidance, context or anything of value for the coding IDE to work with. They are relying a mindset of the 2030's to work with systems from 2025. We aint there yet folks, give as much guidance and context as you can and you will have a better time.

xianshou · 32d ago

Amusingly, about 90% of my rat's-nest problems with Sonnet 3.7 are solved by simply appending a few words to the end of the prompt:

"write minimum code required"

It's not even that sensitive to the wording - "be terse" or "make minimal changes" amount to the same thing - but the resulting code will often be at least 50% shorter than the un-guided version.

panstromek · 32d ago

Well, the article mentions that this reduces accuracy. Do you hit that problem often then?

SteveMqz · 32d ago

The study the article cited is specifically about when asking the LLMs about misinformation. I think on coding tasks and such shorter answers are usually more accurate.

theshrike79 · 31d ago

Gemini on the other hand has a tendency for super-defensive coding.

It'll check _EVERY_ edge case separately, even in situations where it will never ever happen and if it does, it's a NOP anyway.

YossarianFrPrez · 32d ago

There are two sets of perverse incentives at play. The main one the author focuses on is that LLM companies are incentivized to produce verbose answers, so that when you task an LLM on extending an already verbose project, the tokens used and therefore cost increases.

The second one is more intra/interpersonal: under pressure to produce, it's very easy to rely on LLMs to get one 80% of the way there and polish the remaining 20%. I'm in a new domain that requires learning a new language. So something I've started doing is asking ChatGPT to come up with exercises / coding etudes / homework for me based on past interactions.

vanschelven · 32d ago

> Its “almost there” quality — the feeling we’re just one prompt away from the perfect solution — is what makes it so addicting. Vibe coding operates on the principle of variable-ratio reinforcement, a powerful form of operant conditioning where rewards come unpredictably. Unlike fixed rewards, this intermittent success pattern (“the code works! it’s brilliant! it just broke! wtf!”), triggers stronger dopamine responses in our brain’s reward pathways, similar to gambling behaviors.

Though I'm not a "vibe coder" myself I very much recognize this as part of the "appeal" of GenAI tools more generally. Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.

Suppafly · 32d ago

>Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.

Especially when you try to get them to generate something they explicitly tell you they won't, like nudity. It feels akin to hacking.

dingnuts · 32d ago

it's not like gambling, it is gambling. you exchange dollars for chips (tokens -- some casinos even call the chips tokens) and insert it into the machine in exchange for the chance of a prize.

if it doesn't work the first time you pull the lever, it might the second time, and it might not. Either way, the house wins.

It should be regulated as gambling, because it is. There's no metaphor, the only difference from a slot machine is that AI will never output cash directly, only the possibility of an output that could make money. So if you're lucky with your first gamble, it'll give you a second one to try.

Gambling all the way down.

NathanKP · 32d ago

This only makes sense if you have an all or nothing concept of the value of output from AI.

Every prompt and answer is contributing value toward your progress toward the final solution, even if that value is just narrowing the latent space of potential outputs by keeping track of failed paths in the context window, so that it can avoid that path in a future answer after you provide followup feedback.

The vast majority of slot machine pulls produce no value to the player. Every single prompt into an LLM tool produces some form of value. I have never once had an entirely wasted prompt unless you count the AI service literally crashing and returning a "Service Unavailable" type error.

One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.

NegativeLatency · 32d ago

> Every prompt and answer is contributing value toward your progress toward the final solution

This has not been my experience, maybe sometimes, but certainly not always.

As an example: asking chatgpt/gemini about how to accomplish some sql data transformation set me back in finding the right answer because the answer it did give me was so plausible but also super duper not correct in the end. Would've been better off not using it in that case.

Brings to mind "You can't build a ladder to the moon"

NathanKP · 32d ago

In your anecdote I still see this as producing value. If I was lacking in knowledge about the problem space, and therefore fell into the trap of pursuing a "plausible but also super duper not correct" answer from an LLM, then I could have easily fell into that trap solo as well.

But with an LLM, I was able to eliminate this bad path faster and earlier. I also learned more about my own lack of knowledge and improved myself.

I truly mean it when I say that I have never had an unproductive experience with modern AI. Even when it hallucinates or gives me a bad answer, that is honing my own ability to think, detect inconsistencies, examine solutions for potential blindspots, etc.

secabeen · 32d ago

> One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.

That assumes that the value of a solution is linear with the amount completed. If the Pareto Principle holds (80% of effects come from 20% of causes), then not getting that critical 10+% likely has an outsized effect on the value of the solution. If I have to do the 20% of the work that's hard and important after taking what the LLM did for the remainder, I haven't gained as much because I still have to build the state machine in my head to understand the problem-space well enough to do that coding.

NathanKP · 32d ago

This isn't a bad thing at all. It just means that AI utilization doesn't have quite the exponential impact that many marketers are trying to sell. And that's okay.

I personally think of AI tools as an incremental aid that enables me to focus more of my efforts on the really hard 10-20% of the problem, and get paid more to excel at doing what I do best already.

PaulDavisThe1st · 32d ago

This assumes you can easily and reliably identify the 10% you need to fix.

NathanKP · 32d ago

Why wouldn't you be able to do identify the 10% that you need to fix?

AI is not an excuse to turn off your brain. I find it ironic that many people complain that they have a hard time identifying the hallucinations in LLM generated content, and then also complain that LLM's are making LLM users dumber.

The problem here is also the solution. LLM's make smarter people even smarter, because they get even better at thinking about the hard parts, while not wasting time thinking about the easy parts.

But people who don't want to think at all about what they are doing... well they do get dumber.

PaulDavisThe1st · 32d ago

It is extremely well known in the world of programming that reading code is substantially harder than writing it. Just because you have the code in front of you does not mean that determining that it is correct is a trivial (or even moderately easy) task.

NathanKP · 32d ago

That's right. I don't think that AI makes coding easy or trivial. What it does do, is it accelerates your ability to get past the easy and trivial stuff, to the hard parts.

When you get deep into engineering with AI you will find yourself spending a dramatically larger percentage of your time thinking about the hardest things you have ever thought about, and dramatically less time thinking about basic things that you've already done hundreds of times before.

You will find the limits of your abilities, then push past those limits like a marathon runner gaining extra endurance from training.

I think the biggest lie in the AI industry is that AI makes things easier. No, if anything you will find yourself working on harder and harder things because the easy parts are done so quickly that all that is left is the hard stuff.

princealiiiii · 32d ago

> It should be regulated as gambling, because it is.

That's wild. Anything with non-deterministic output will have this.

martin-t · 32d ago

That's incorrect, gambling is about waiting.

Brain scans have revealed that waiting for a potential win stimulates the same areas as the win itself. That's the "appeal" of gambling. Your brain literally feels like it's winning while waiting because it _might_ win.

kagevf · 32d ago

> "Anything with non-deterministic output will have this.

Anything with non-deterministic output that charges money ...

Edit Added words to clarify what I meant.

GuinansEyebrows · 32d ago

i think at least a lot of things (if not most things) that i pay for have an agreed-upon result in exchange for payment, and a mitigation system that'll help me get what i paid for in the event that something else prevents that from happening. if you pay for something and you don't know what you're going to get, and you have to keep paying for it in the hopes that you get what you want out of it... that sounds a lot like gambling. not exactly, but like.

0cf8612b2e1e · 32d ago

If I ask an artist to draw a picture, I still have to pay for the service, even if I am unhappy without the result.

cogman10 · 32d ago

In the US? No, you actually do not need to pay for the service if you deem the quality of the output to be substandard. In particular with art, it's pretty standard to put in a non-refundable downpayment with the final payment due on delivery.

You only lose those rights in the contracts you sign (which, in terms of GPT, you've likely clicked through a T&C which waves all right to dispute or reclaim payment).

If you ask an artist to draw a picture and decide it's crap, you can refuse to take it and to pay for it. They won't be too happy about it, but they'll own the picture and can sell it on the market.

0cf8612b2e1e · 32d ago

There must be artists working on an hourly contract rate.

Maybe art is special, but there are other professions where someone can invest heaps of time and effort without delivering the expected result. A trial attorney, treasure hunter, oil prospector, app developer. All require payment for hours of service, regardless of outcome.

cogman10 · 32d ago

It'll mostly depend on the contract you sign with these services and the state you live in.

When it comes to work that requires craftmanship it's pretty common to be able to not pay them if they do a poor job. It may cost you more than you paid them to fix their mistake, but you can generally reclaim your money you paid them if the work they did was egregiously poor.

nkrisc · 32d ago

Sounds like you should negotiate a better contract next time, such as one that allows for revisions.

No comments yet

GuinansEyebrows · 32d ago

maybe more accurately anything with non-deterministic output that you have to pay-per-use instead of paying by outcome.

Suppafly · 32d ago

>that you have to pay-per-use instead of paying by outcome.

That's still not gambling and it's silly to pretend it is. It feels like gambling but that's it.

rapind · 32d ago

By this logic:

- I buy stock that doesn't perform how I expected.

- I hire someone to produce art.

- I pay a lawyer to represent me in court.

- I pay a registration fee to play a sport expecting to win.

- I buy a gift for someone expecting friendship.

Are all gambas.

You aren't paying for the result (the win), you are paying for the service that may produce the desired result, and in some cases one of may possibly desirable results.

rjbwork · 32d ago

>I buy stock that doesn't perform how I expected.

Hence the adage "sir, this is a casino"

nkrisc · 32d ago

None of those are a games of chance, except the first.

Suppafly · 32d ago

>None of those are a games of chance, except the first.

Neither is GenAI, the grandparent comment is dumb.

nkrisc · 30d ago

I see the resemblance, though. Money goes into mystery machine, thing you were hoping for maybe comes out. If it didn't, put more money in until you get the prize you want.

squeaky-clean · 32d ago

So how exactly does that work for the $25/mo flat fee that I pay OpenAI for chatgpt. They want me to keep getting the wrong output and burning money on their backend without any additional payment from me?

dwringer · 32d ago

Something of an aside, but this is sort of equivalent to asking "how does that work for the $50 dollars the casino gave me to gamble with for free"? I once made 50 dollars exactly in that way by taking the casino's free tokens and putting them all on black in a single roulette spin. People like that are not the ones companies like that make money off of.

kimixa · 32d ago

For the amount of money OpenAI burns that $25/mo is functionally the same as zero - they're still in the "first one is free" phase.

Though you could say the same thing about pretty much any VC funded sector in the "Growth" phase. And I probably will.

AlexCoventry · 32d ago

Is it really gambling, if the house always loses? :-)

csallen · 32d ago

Books are not like gambling, they are gambling. you exchange dollars for chips (money — some libraries even give you digital credits for "tokens") and spend them on a book in exchange for the chance of getting something good out of it.

If you don't get something good the first time you buy a book, you might with the next book, or you might not. Either way, the house wins.

It should be regulated as gambling, because it is. There's no metaphor — the only difference from a slot machine is that books will never output cash directly, only the possibility of an insight or idea that could make money. So if you're lucky with your first gamble, you'll want to try another.

Gambling all the way down.

mystified5016 · 32d ago

I run genAI models on my own hardware for free. How does that fit into your argument?

codr7 · 32d ago

The fact that you can get your drugs for free doesn't exactly make you less of an addict.

latentsea · 32d ago

I used to run GenAI image generators on my own hardware, and I 200% agree with your stance. Literally wound up selling my RTX 4090 to get the dealer to move out of the house. I'm better off now, but can't ever really own a GPU again without opening myself back up to that. Sigh...

squeaky-clean · 32d ago

It does literally make it not gambling though, which is what's betting discussed.

It also kind of breaks the whole argument that they're designed to be addictive in order to make you spend more on tokens.

codr7 · 32d ago

As long as that argument makes you happy, go for it :)

abletonlive · 32d ago

Yikes. The reactionary reach for more regulation from a certain group is just so tiresome. This is the real mind virus that I wish would be contained in Europe.

I almost can't believe this idea is being seriously considered by anybody. By that logic buying any CPU is gambling because it's not deterministic how far you can overclock it.

Just so you know, not every llm use case requires paying for tokens. You can even run a local LLM and use cline w/ it for all your coding needs. Pull that slot machine lever as many times as you like without spending a dollar.

slurpyb · 32d ago

Do you understand what electricity is?

abletonlive · 32d ago

Oh so now you think because it consumes some resource it's gambling? Would you say farming is gambling because it consumes water and time and you won't know what the result will be?

yewW0tm8 · 32d ago

Same with anything though? Startups, marriages, kids.

All those laid off coders gambled on a career that didn’t pan out.

Want more certainty in life, gonna have to get political.

And even then there is no guarantee the future give a crap. Society may well collapse in 30 years, or 100…

This is all just role play to satisfy the prior generations story driven illusions.

Inityx · 32d ago

If you consider your marriage or your kids to be a gamble, that's a sign that you shouldn't get married or have kids.

bitwize · 32d ago

"Vibe coding as gacha game" is a new wrinkle I didn't expect. It certainly explains why I see people who should know better talking up AI and LLMs like they're the second coming: it's like how stoners talk about weed as a cancer cure.

flashgordon · 32d ago

This addiction and fear of things-going-bad-if-i-dont-listen-to-the-copilot is precisely why my workflow is a bit more simple and caveman-ish:

1. start a project with vague README (or take an existing one).

2. create makefile with the "prompt" action that looks something like (I might put it in a script to work around tabs etc):

```

prompt:

    for f in `find ./ | grep '*.go *.ts *.files_i_care_about' | grep -v 'files to ignore' | pbcopy`

    do

        echo "// FILE: $f"

        cat $f

    done

```

3. Run `make prompt` to get a fresh new starting prompt, Go to Gemini (AI Studio) and use the prompt:

```You have the following files. Understand it and we will start building some features.

<Ctrl-v to paste the files copied above> ```

4. It thinks, understands and gives me the "I am ready" line.

5. To build feature X I simply prompt it with:

``` I want to build feature X. Understand it, plan it, and do not regenerate entire files. Just give me unix style diffs. ```

6. Iterate on what i like and dont (including refactors, etc)

7. Copy patches and apply locally

8. Repeat steps 5 - 7.

10. After about 300-400k tokens generated (say over 20-40 features) I snapshot with the prompt:

``` Great now is a great time to checkpoint. Generate a SUMMARY.md on a per folder basis of your understanding of the current state of the project along with a roadmap of next steps. ```

11. I save/update the SUMMARY.md and go to bed. When I come back I repeat from step 2 - and voila the SUMMARY.md generated before are included too.

I have generated about 20M tokens so far at a cost of 0. For me "copy/pasting" diffs is not a big deal. Getting clean code, having a nice custom workflow is more important. I am still ready to relinquish control fully to an agent. I just want a really good code search/auto-complete out of the LLM that adheres to *my* interfaces and constraints.

insane_dreamer · 31d ago

> In an effort to impress the user and over-deliver, LLMs end up creating a rat’s nest of ultra-defensive code littered with debugging statementsIn an effort to impress the user and over-deliver, LLMs end up creating a rat’s nest of ultra-defensive code littered with debugging statements

This has been my experience as well. I have to continuously explicitly instruct Claude to be more concise (though that often leads to broken code ...). Gemini is even more verbose.

I'm not sure in the end how much time is saved over simple good auto-completes (for method syntax lookups), other than for rote tasks like "replicate this pattern across X" (and even then it doesn't get it 100% right), and for quick answers to specific questions usually in frameworks I'm not that well versed it that I would have searched SO for ("how do I do X in Qt?", "how do I do the equivalent of Y in Linux on Windows")--but even then I have to verify the answer, whereas if it's a highly voted answer on SO I'll know it works (or there will be helpful comments to the contrary under the reply).

Most of the "it can build X app for you automatically" comments I read remind me of "build a Rails app in 5 lines" (back in the day).

johnea · 32d ago

I generally agree with the concerns of this article, and wonder about the theory of the LLM having a innate inclination to generate bloated code.

Even in this article though, I feel like there is a lot of anthropomorphization of LLMs.

> LLMs and their limitations when reasoning about abstract logic problems

As I understand them, LLMs don't "reason" about anything. It's purely a statistical sequencing of words (or other tokens) as determined by the training set and the prompt. Please correct me if I'm wrong.

Also, regarding this theory that the models may be biased to produce bloated code: I've reposted this once already, and no one has replied yet, and I still wonder:

----------

To me, this represents one of the most serious issues with LLM tools: the opacity of the model itself. The code (if provided) can be audited for issues, but the model, even if examined, is an opaque statistical amalgamation of everything it was trained on.

There is no way (that I've read of) for identifying biases, or intentional manipulations of the model that would cause the tool to yield certain intended results.

There are examples of DeepState generating results that refuse to acknowledge Tienanmen square, etc. These serve as examples of how the generated output can intentionally be biased, without the ability to readily predict this general class of bias by analyzing the model data.

----------

I'm still looking for confirmation or denial on both of these questions...

exiguus · 32d ago

I understand your point. The Vibe approach is IMO only effective when you adopt a software engineering mindset. Here's how it works (at least for me with Copilote agent mode):

1. Develop a Minimum Viable Product (MVP) or prototype that functions.

2. Write tests, either before or after the initial development.

3. Implement coding guidelines, style guides, linter etc. Do code reviews.

4. Continuously adjust, add features, refactor, review and expand your test suite. Iterate and let AI run tests and linters on each change

While this process may seem lengthy, it ensures reliability and efficiency. Experienced engineers might find it as quick as working solo, but the structured approach guarantees success. It feels like pairing with a inexperienced developer.

Also, this process may run you into rate limits with Copilot and might not work with your current codebase due to a lack of tests and the absence of applied coding style guides.

Additionally, it takes time. For example, for a simple to mid-level tool/feature in Go, it might take about 1 hour to develop the MVP or prototype, but another 6 to 10 hours to refine it to a quality that you might want to show to other engineers.

postalrat · 32d ago

I have doubts that testing is going to be the key to make vibe coding work for non-trivial projects. I'd focus on developing great well documented interfaces between components and keeping the scope of your agent under control.

exiguus · 31d ago

Perhaps I wasn't clear, but I was referring to a small tool or a specific feature that, in my opinion, embodies your approach. Keeping the scope clear and minimizing cognitive load—through practices like code splitting—are essential. Implementing tests, whether unit, component, or integration tests, as part of your testing strategy, helps provide quick feedback on recent changes. As you would do it on a traditional codebase.

nbittich · 32d ago

At best, the only useful thing I can get from chat gpt, deepSeek, or grok is keywords I can search on Google to find a valid solution to my problem. I get so frustrated with them that I almost never use LLMs, except for fixing grammar or translating. It's not because I'm against them, but because they are useless to me and a massive waste of time.

mullingitover · 32d ago

I've definitely noticed that LLMs want to generate Enterprise-Grade™ code right out of the box. I customize the prompts to tell them that we're under intense pressure to minimize line counts, every line costs $10k, and so to find the simplest solution that will get the job done.

bradly · 32d ago

> it might be difficult for AI companies to prioritize code conciseness when their revenue depends on token count.

Would open source, local models keep pressure on AI companies to prioritize the usable code, as code quality and engineering time saved are critical to build vs buy discussions?

jsheard · 32d ago

Depends if open source models can remain relevant once the status quo of "company burns a bunch of VC money to train a model, open sources it, and generates little if any revenue" runs out of steam. That's obviously not sustainable long term.

Larrikin · 32d ago

Maybe we will get some university backed SETI like projects to replace all those personal mining rigs now that that hype is finally fading.

charcircuit · 32d ago

This article ignores the enormous demand of AI coding paired with competition between providers. Reducing the price of tokens means that people can afford to generate more tokens. A code provider being cheaper on average to operate than another is a competitive advantage.

comex · 32d ago

> There was no standardization of parts in the probe. Two widgets intended to do almost the same job could be subtly different or wildly different. Braces and mountings seemed hand carved. The probe was as much a sculpture as a machine.

> Blaine read that, shook his head, and called Sally. Presently she joined him in his cabin.

> “Yes, I wrote that," she said. "It seems to be true. Every nut and bolt in that probe was designed separately. It's less surprising if you think of the probe as having a religious purpose. But that's not all. You know how redundancy works?"

> “In machines? Two gilkickies to do one job. In case one fails."

> “Well, it seems that the Moties work it both ways."

> “Moties?"

> She shrugged. "We had to call them something. The Mote engineers made two widgets do one job, all right, but the second widget does two other jobs, and some of the supports are also bimetallic thermostats and thermoelectric generators all in one. Rod, I barely understand the words. Modules: human engineers work in modules, don't they?"

> “For a complicated job, of course they do."

> “The Moties don't. It's all one piece, everything working on everything else. Rod, there's a fair chance the Moties are brighter than we are."

- The Mote in God's Eye, Larry Niven and Jerry Pournelle (1974)

[…too bad that today's LLMs are not brighter than we are, at least when it comes to writing correct code…]

mnky9800n · 32d ago

That book is very much fun and also I never understood why Larry Niven is so obsessed with techno feudalism and gender roles. I think this is my favourite book but I think his best book is maybe Ringworld.

AlexCoventry · 32d ago

The zero-sum mentality which leads people to think that way is already clear in The Mote In God's Eye. I think the point of the book is that despite being superior to humans in every way imaginable, the Moties are condemned to repeated violent conflict by Malthusian pressures, because they have nowhere to expand. One way I interpret the "mote" in God's eye is the authors' belief that no matter how good we get, we'll always be in potentially violent conflict with each other for limited resources. (The "beam" in our own eye is then that we're still fighting each other over less pressing concerns. :-)

Loughla · 32d ago

Ringworld is a great book. The later books have great concepts, but could do without so much. . . rishing. Niven plainly inserted his furry porn fetish into those books, for reasons unclear to any human alive.

Suppafly · 32d ago

>for reasons unclear to any human alive

Given how prevalent furries seem to be, especially in nerd adjacent culture, I'd say he was ahead of his time.

Suppafly · 32d ago

>I think this is my favourite book but I think his best book is maybe Ringworld.

Ringworld is pretty good, the multiples sequels get kind of out there.

mnky9800n · 32d ago

I never read any of the sequels just a couple of the short story collections and some of the man kzin wars. What’s wild about them?

Suppafly · 31d ago

It's been a long time since I read them, but I recall the sequels being less focused plot-wise and also having a lot more of the human + alien/furry relations. I think I read 5 or 6 of them and mostly enjoyed them, but eventually moved on to other authors. Might be time to revisit them now that I've mostly forgotten them enough that it'll feel fresh again.

jerf · 32d ago

Yeah, I've had that thought too.

I think a lot about Motie engineering versus human engineering. Could Motie engineering be practical? Is human engineering a fundamentally good idea, or is it just a reflection of our working memory of 7 +/- 2? Biology is Motie-esque, but it's pretty obvious we are nowhere near a technology level that could ever bring a biological system up from scratch.

If Motie engineering is a good idea, it's not a smooth gradient. The Motie-est code I've seen is also the worst. It is definitely not the case that getting a bit more Motie-esque, all else being equal, produces better results. Is there some crossover point where it gets better and maybe passes our modular designs? If AIs do get better than us at coding, and it turns out they do settle on Motie-esque coding, no human will ever be able to penetrate it ever again. We'd have to instruct our AI coders to deliberately cripple themselves to stay comprehensible, and that is... economically a tricky proposition.

After all, anyone can write anything into a novel they want to and make anything work. It's why I've generally stopped reading fiction that is explicitly meant to make ideological or political points to the exclusion of all else; anything can work on a page. Does Motie engineering correspond to anything that could be manifested practically in reality?

Will the AIs be better at modularization than any human? Will they actually manifest the Great OO Promise of vast piles of amazingly well-crafted, re-usable code once they mature? Or will the optimal solution turn out to be bespoke, locally-optimized versions of everything everywhere, and the solution to combining two systems is to do whatever locally-sensible customizations are called for?

(I speak of the final, mature version, however long that may be. Today LLMs are kind of the worst of both worlds. That turns out to be a big step up from "couldn't play in this space at all", so I'm not trying to fashionably slag on AIs here. I'm more saying that the one point we have is not yet enough to draw so much as a line through, let alone an entire multi-dimensional design methodology utility landscape.)

I didn't expect to live to see the answers, but maybe I will.

rcxdude · 32d ago

>I think a lot about Motie engineering versus human engineering. Could Motie engineering be practical? Is human engineering a fundamentally good idea, or is it just a reflection of our working memory of 7 +/- 2? Biology is Motie-esque, but it's pretty obvious we are nowhere near a technology level that could ever bring a biological system up from scratch.

It's the kind of thing you commonly get if you let an unconstrained optimization process run for long enough. It will generally be better, according to whatever function you're optimizing for. The main disadvantage, apart from being hard to understand or modify the design, is manufacturing and repair (needing to make many different parts), but if you have sufficiently good manufacturing technology (e.g. atomic level printers), then that may be a non-issue. And in software it's already feasible: you can see very small scale versions of this in extremely resource-constrained environments where it's worthwhile really trying to optimize things (see some demoscene entries), but it's pretty rare (some tricks that optimizing compilers pull off are similar, but they are generally very local).

fwip · 32d ago

For me, "Motie engineering" always brings to mind "The Story of Mel." http://www.catb.org/jargon/html/story-of-mel.html

samtp · 32d ago

I've pretty clearly seen the critical thinking ability of coworkers who depend on AI too much sharply decline over the past year. Instead of taking 30 seconds to break down the problem and work through assumptions, they immediately copy/paste into an LLM and spit back what it tells them.

This has lead to their abilities stalling while their output seemingly goes up. But when you look at the quality of their output, and their ability to get projects over the last 10% or make adjustments to an already completed project without breaking things, it's pretty horrendous.

Etheryte · 32d ago

My observations align with this pretty closely. I have a number of colleagues who I wager are largely using LLM-s, both by changes in coding style and how much they suddenly add comments, and I can't help but feel a noticeable drop in the quality of the output. Issues that should clearly have no business making it to code review are now regularly left for others to catch, it often feels like they don't even look at their own diffs. What to make of it, I'm not entirely sure. I do think there are ways LLM-s can help us work in better ways, but they can also lead to considerably worse outcomes.

jimbokun · 32d ago

Just replace your colleagues with the LLMs they are using. You will reduce costs with no decrease in the quality of work.

andy99 · 32d ago

I think lack of critical thinking is the root cause, not a symptom. I think pretty much everyone uses LLMs these days, but you can tell who sees the output and considers it "done" vs who uses LLM output as an input to their own process.

mystified5016 · 32d ago

I mean, I can tell that I'm having this problem and my critical thinking skills are otherwise typically quite sharp.

At work I've inherited a Kotlin project and I've never touched Kotlin or android before, though I'm an experienced programmer in other domains. ChatGPT has been guiding me through what needs to be done. The problem I'm having is that it's just too damn easy to follow its advice without checking. I might save a few minutes over reading the docs myself, but I don't get the context the docs would have given me.

I'm a 'Real Programmer' and I can tell that the code is logically sound and self-consistent. The code works and it's usually rewritten so much as to be distinctly my code and style. But still it's largely magical. If I'm doing things the less-correct way, I wouldn't really know because this whole process has led me to some pretty lazy thinking.

On the other hand, I very much do not care about this project. I'm very sure that it will be used just a few times and never see the light of day again. I don't expect to ever do android development again after this, either. I think lazy thinking and farming the involved thinking out to ChatGPT is acceptable here, but it's clear how easily this could become a very bad habit.

I am making a modest effort to understand what I'm doing. I'm also completely rewriting or ignoring the code the AI gives me, it's more of an API reference and example. I can definitely see how a less-seasoned programmer might get suckered into blindly accepting AI code and iterating prompts until the code works. It's pretty scary to think about how the coming generations of programmers are going to experience and conceptualize programming.

jobs_throwaway · 32d ago

As someone who vibe codes at times (and is a professional programmer), I'm curious how yall go about resisting this? Just avoid LLMs entirely and do everything by hand? Very rigorously go over any LLM-generated code before committing?

It certainly is hard when I'm say writing unit tests to avoid the temptation to throw it into Cursor and prompt until it works.

samtp · 32d ago

I resist it by realizing that while LLM are good at things like decoding obtuse error messages, having them write too much of your code leads to a project becoming almost impossible to maintain or add to. And there are many cases where you spend more time trying to correct errors from the LLM than if you were to slow down and inspect the code yourself.

christophilus · 32d ago

If you don’t commit its output until it’s in a shape that is maintainable and acceptable to you— just like with any other pair programming exercise— you’ll be fine. I do think your skills will atrophy over time, though. I’m not sure what the right balance is, here.

AndyNemmity · 32d ago

My honest opinion is that some of my skills are atrophying, and some of them are increasing.

I have managed a python app for a long time due to it being part of a much larger set of services I manage. I've never been particularly comfortable with it.

I am easily learning, and understanding the python much much better.

I think I'm atrophying in a lot of syntax, and typing automatic things.

It doesn't really feel straight forward that it's one or the other.

breckenedge · 32d ago

Set a budget. Get rate limited. Let the experience remind you how much time you’re actually wasting letting the model write good looking but buggy code, versus just writing code responsibly.

Workaccount2 · 32d ago

Are using the APIs worth the extra cost vs using the web tools? I haven't used any API tools, I am not a programmer, but I have generated many millions of tokens in the web canvas, something that would cost way more than the $20 I spend for them.

thimabi · 32d ago

I think the idea that LLMs are incentivized to write verbose code fails when one considers non-API usage.

Like you, I’ve accumulated tons of LLM usage via apps and web apps. I can actually see how the models are much more succinct there compared to the API interface.

My uneducated guess is that LLM models try to fit their responses into the “output tokens” limit, which is surely much lower in UIs than what can be set in pay-as-you-go interfaces.

jfim · 32d ago

If you're using Claude code or cursor, for example, they can read files automatically instead of needing the user to copy paste back and forth.

Both can generate code though, I've generated code using the web interface and it works, it's just a bit tedious to copy back and forth.

cadamsdotcom · 32d ago

Curious what happens if, rather than asking for conciseness, one asks for elegance.

jay_kyburz · 32d ago

elegance is in the eye of the beholder. I want grug brain code.

_--__--__ · 32d ago

I agree with the thrust of the blog post but honestly most of the increased line count comes from the fact that LLMs have been whipped into leaving moronic comments like

// Create a copy of the state

const stateCopy = copyState(state);

tptacek · 32d ago

I'm waiting for the front line coding agents to just strip all comments from LLM output; it seems like a very reasonable thing to do, given you can just ask for comments where you think they're useful.

tippytippytango · 32d ago

This article captures a lot of the problem. It’s often frustrating how it tries to work around really simple issues with complex workarounds that don’t work at all. I tell it the secret simple thing it’s missing and it gets it. It always makes me think, god help the vibe coders that can’t read code. I actually feel bad for them.

martin-t · 32d ago

> I tell it the secret simple thing it’s missing and it gets it.

Anthropomorphizing LLMs is not helpful. It doesn't get anything, you just gave it new tokens, ones which are more closely correlated with the correct answer. It also generates responses similar to what a human would say in the same situation.

Note i first wrote "it also mimicks what a human would say", then I realized I am anthropomorphizing a statistical algorithm and had to correct myself. It's hard sometimes but language shapes how we think (which is ironically why LLMs are a thing at all) and using terms which better describe how it really works is important.

ben_w · 32d ago

Given that LLMs are trained on humans, who don't respond well to being dehumanised, I expect anthropomorphising them to be better than the opposite of that.

https://www.microsoft.com/en-us/worklab/why-using-a-polite-t...

SchemaLoad · 32d ago

Aside from just getting more useful responses back, I think it's just bad for your brain to treat something that acts like a person with disrespect. Becomes "it's just a chatbot", "It's just a dog", "It's just a low level customer support worker".

ben_w · 32d ago

While I also agree with you on that, there are also prompts that make them not act like a person at all, and prompts can be write-once-use-many which lessens the impact of that.

This is why I tend to lead with the "quality of response" argument rather than the "user's own mind" argument.

martin-t · 31d ago

I am not talking about getting it to generate useful output, treating it extra politely or threatening with fines seems to give better results sometimes so why not, I am talking about the phrase "gets it". It does not get anything.

tippytippytango · 32d ago

Patronizing much?

Suppafly · 32d ago

>Anthropomorphizing LLMs is not helpful

It's a feature of language to describe things in those terms even if they aren't accurate.

>using terms which better describe how it really works is important

Sometimes, especially if you doing something where that matters, but abstracting those details away is also useful when trying to communicate clearly in other contexts.

grufkork · 32d ago

Working as an instructor for a project course for first-year university students, I have run in to this a couple of times. The code required for the project is pretty simple, but there are a couple of subtle details that can go wrong. Had one group today with bit shifts and other "advanced" operators everywhere, but the code was not working as expected. I asked them to just `Serial.println()` so they could check what was going on, and they were stumped. LLMs are already great tools, but if you don't know basic troubleshooting/debugging you're in for a bad time when the brick wall arrives.

On the other hand, it shows how much coding is just repetition. You don't need to be a good coder to perform serviceable work, but you won't create anything new and amazing either, if you don't learn to think and reason - but that might for some purposes be fine. (Worrying for the ability of the general population however)

You could ask whether these students would have gotten anything done without generated code? Probably, it's just a momentarily easier alternative to actual understanding. They did however realise the problem and decided by themselves to write their own code in a simpler, more repetitive and "stupid" style, but one that they could reason about. So hopefully a good lesson and all well in the end!

tippytippytango · 32d ago

Sounds like you found a good problem for the students. Having the experience of failing to get the right answer out of the tool and then succeeding on your whits creates an opportunity to learn these tools benefit from disciplined usage.

iotku · 32d ago

There's a pretty big gap between "make it work" and "make it good".

I've found with LLMs I can usually convince them to get me at least something that mostly works, but each step compounds with excessive amounts of extra code, extraneous comments ("This loop goes through each..."), and redundant functions.

In the short term it feels good to achieve something 'quickly', but there's a lot of debt associated with running a random number generator on your codebase.

didgetmaster · 32d ago

In my opinion, the difference between good code and code that simply works (sometimes barely); is that good code will still work (or error out gracefully) when the state and the inputs are not as expected.

Good programs are written by people who anticipate what might go wrong. If the document says 'don't do X'; they know a tester is likely to try X because a user will eventually do it.

alternatex · 31d ago

I feel like you're talking about programs here rather than code. A program that behaves well is not necessarily built with good code.

I can see an LLM producing a good program with terrible code that's hard to grok and adjust.

r053bud · 32d ago

I fear that’s going to end up being a significant portion of engineers in the future.

babyent · 32d ago

I think we are in the Flash era again lol.

You remember those days right? All those Flash sites.

ramoz · 32d ago

I disagree with the idea that LLM providers are deliberately designing solutions to consume more tokens. We're in the early days of agentic coding, and the landscape is intensely competitive. Providers are focused on building highly capable systems to drive adoption, especially with open-source alternatives just a git clone away.

Yes, Claude Code can be token-heavy, but that's often a trade-off for their current level of capability compared to other options. Additionally, Claude Code has built-in levers for cost (I prefer they continue to focus on advanced capability, let pricing accessibility catch up).

"early days" means:

- Prompt engineering is still very much a required skill for better code and lower pricing

- Same with still needing to be an engineer for the same reasons, and:

- Devs need to actively guide these agents. This includes detailed planning, progress tracking, and careful context management – which, as the author notes, is more involved than many realize. I've personally found success using Gemini to create structured plans for Claude Code to execute, which helps manage its verbosity and focus to "thoughtful" execution (as guided by gemini). I drop entire codebases into Gemini (for free).

slurpyb · 32d ago

It’s so cool that we’re all actively participating in the handover of all our work to these massive companies so we can be forever reliant on their blackbox subscriptions. Don’t fret; there will be a day where those profit numbers will have to go up and they will consciously make the product worse, just to trigger more queries, and thus extract more money from you. Gross.

mecredis · 32d ago

Hi! Author here. I don't actually think they're deliberately doing this, hence my choice of "perverse incentives" vs. something more accusatory. The issue is that they don't have a ton of incentive to fix it.

Agree with you on all the rest, and I think writing a post like this was very much intended as a gut-check on things since the early days are hopefully the times when things can get fixed up.

ramoz · 32d ago

My speculation is that these companies have significant reason to prioritize lowering the amount of tokens produced as well as cost of tokens.

The leaked Claude Code codebase was riddled with "concise", "do not add comments", "mimic codestyle", even an explicit "You should minimize output tokens as much as possible" etc. Btw, Claude Code uses a custom system prompt, not the leaked 24k claude.ai one.

cadamsdotcom · 32d ago

It'd be great to forbid comments eg. using some of the techniques pioneered by Outlines, Instructor et al.

For now I'll do it with some examples in my context priming prompt, like:

Do not emit comments. Instead of this:

    # frobnicate a fyzzit
    def frobnicate_fyzzit(self):
        """
        This function frobnicates a fyzzit.
        """

        # Get the fyzzit to frobnicate.
        fyzzit = ...
        ...

Do this:

    def frobnicate_fyzzit(self):
        fyzzit = ...
        ...

neilv · 32d ago

I would seriously consider banning "vibe coding" right now, because:

1. Poor solutions.

2. Solutions not understood by the person who prompted them.

3. Development team being made dumber.

4. Legal and ethical concerns about laundering open source copyrights.

5. I'm suspicious of the name "vibe coding", like someone is intentionally marketing it to people who don't care to be good at their jobs.

6. I only want to hire people who can do holistically better work than current "AI". (Not churn code for a growth startup's Potemkin Village, nor to only nominally satisfy a client's requirements while shipping them piles of counterproductive garbage.)

7. Publicizing that you are a no-AI-slop company might scare away the majority of the bad prospective employees, while disproportionately attracting the especially good ones. (Not that everyone who uses "AI" is bad, but they've put themselves in the bucket with all the people who are bad, and that's a vastly better filter for the art of hiring than whether someone has spent months memorizing LeetCode answers solely for interviews.)

Pxtl · 32d ago

I can feel how the extreme autocomplete of AI is a drug.

Half of my job is fighting the "copy/paste/change one thing" garbage that developers generate. Keeping code DRY. The autocompletes do an amazing job of automating the repeated boilerplate. "Oh you're doing this little snippet for the first and second property? Obviously you want to do that for every property! Let me just expand that out for you!"

And I'm like "oooh, that's nice and convenient".

...

But I also should be looking at that with the stink-eye... part of that code is now duplicated a dozen times. Is there any way to reduce that duplication to the bare minimum? At least so it's only one duplicated declaration or call and all of the rest is per-thingy?

Or any way to directly/automatically wrap the thing without going property-by-property?

Normally I'd be asking myself these questions by the 3rd line. But this just made a dozen of those in an instant. And it's so tempting and addictive to just say "this is fine" and move on.

That kind of code is not fine.

Ancapistani · 32d ago

> That kind of code is not fine.

I agree, but I'm also challenging that position within myself.

Why isn't it OK? If your primary concern is readability, then perhaps LLMs can better understand generated code relative to clean, human-readable code. Also, if you're not directly interacting with it, who cares?

As for duplication introducing inconsistencies, that's another issue entirely :)

Suppafly · 32d ago

>That kind of code is not fine.

Depends on your definition of fine. Is it less readable because it's doing the straight forward thing several times instead of wrapping it into a loop or a method, or is it more readable because of that.

Is it not fine because it's slower, or does it all just compile down to the same thing anyway?

Or is it not fine because you actually should be doing different things for the different properties but assumed you don't because you let the AI do the thinking for you?

Pxtl · 32d ago

It is not fine because it is too verbose for me to have any confidence that there isn't something awful hiding in there.

rcarmo · 32d ago

As it happens, I wrote about the need for planning and organizing work (for greenfield or understanding existing projects) only yesterday: https://taoofmac.com/space/blog/2025/05/13/2230

replyifuagree · 31d ago

As in all things, it depends. For example if you have an iron clad exit plan after you land that promotion for "delivering" a steaming pile for someone else to debug, and you have no sense of shame, then by all means!

protocolture · 32d ago

Considering most of these LLMs are in some kind of loss leading ramp up phase, wouldn't the incentive lean towards using the least number of tokens?

Werent they recently complaining that people thanking LLMs were costing them too much money?

stefap2 · 32d ago

I found useful asking it to write with minimal lines. Let it write a functional code and ask to remove all unnecessary lines, minimalistic, efficient code.

andrewstuart · 32d ago

Claude was last week.

The author should try Gemini it’s much better.

martin-t · 32d ago

Honestly can't tell if satire or not.

jazoom · 32d ago

It's not satire. Gemini is much better for coding, at least for me.

Just to illustrate, I asked both about a browser automation script this morning. Claude used Selenium. Gemini used Playwright.

I think the main reasons Gemini is much better are:

1. It gets my whole code base as context. Claude can't take that many tokens. I also include documentation for newer versions of libraries (e.g. Svelte 5) that the LLM is not so familiar with.

2. Gemini has a more recent knowledge cutoff.

3. Gemini 2.5 Pro is a thinking model.

4. It's free to use through the web UI.

AlexCoventry · 32d ago

How do you give it your whole code base, via the web UI?

jazoom · 31d ago

I created a script to pack it into a markdown file. Later I found this which does a better job, so I use it now.

https://github.com/yamadashy/repomix

AlexCoventry · 31d ago

Sweet, thanks! Glad I asked.

andrewstuart · 31d ago

In a zip file in ai studio

qwery · 31d ago

I am perpetually amused at the use of the word 'vibe' in 'vibe coding'.

> it’s much more worthwhile to work with a plan composed of discrete tasks that could be explained to a junior level developer

I'm sure this is a more effective way to get more usable results. But I really think anyone in this situation should be taking it as a kind of wake-up call. Mentoring/guiding a junior is work -- there's a significant cost. But it's a cost easily justified -- a lot of people find it intrinsically rewarding, you're training a colleague, etc.. What you're describing here, though, is all of the cost with none of the benefits and you're being the junior developer as well (you must be -- you're doing their work). You're alone, mentoring a chatbot that cannot learn or grow.

> I’m beginning to think the problem runs deeper, and it has to do with the economics of AI assistance.

> When charging by token count, there’s naturally less incentive to optimize for elegant, minimal solutions.

... Or maybe the tool just isn't that good / what you want. There doesn't have to be a conspiracy behind it.

That is to say, I think the main point presented here is very unconvincing. If they could build a tool that could just do what you want in an acceptable manner, they would. People would obviously throw money at that.

It produces verbose, comment-heavy, procedural code because that form reasonably effectively supports the nature of the generator. Procedural code is obviously well-suited to "what comes next?" style append-oriented editing operations. Verbosity eliminates nuance.

OK, one more thing:

> While we wait for AI companies to better align their incentives with our need for elegant code I’ve developed several strategies to counteract verbose code generation

"While we wait" is the most depressing thing I've read in a while. It's just completely at odds with the field itself.

OutOfHere · 32d ago

gpt4.1 doesn't have the issue of doing more than it should. It is why I prefer it for agentic coding. If anything, it may do a little less than it should, which I can remedy with my handwritten code or with a follow-up prompt. Oh and gpt4.1 has an input context that is over twice as large as claude.

UncleOxidant · 32d ago

> I have probably spent over $1,000 vibe coding various projects into reality

dude, you can use Gemini Pro 2.5 with Cline - it's free and is rated at least as good as Claude Sonnet 3.7 right now.

sigmaisaletter · 32d ago

In section 4, the author writes "... cheaper than Claude 3.7 ($0.80 per token vs. $3)".

This is an obvious mistake, the price is per Megatoken, not per token.

Source: https://www.anthropic.com/pricing

No comments yet

gregorymichael · 32d ago

Unlimited Claude Code for $100 on the Max plan is a game changer.

xwowsersx · 32d ago

It's unlimited? I thought it was "Choose 5x or 20x more usage than Pro." Could you please elaborate?

coolcase · 32d ago

Dopamine? That sort of thing triggers cortisol for me if anything!

biker142541 · 32d ago

Can we please stop using 'vibe coding' to mean 'ai assisted coding'?? (best breakdown, imo: https://simonwillison.net/2025/Mar/19/vibe-coding/)

Is it really vibe coding if you are building a detailed coding plan, conducting "git-based experimentation with ruthless pruning", and essentially reviewing the code incrementally for correctness and conciseness? Sure, it's a process dependent on AI, but it's very far from nearly "forget[ing] that the code even exists".

That all said, I do think the article captures some of the current cost/quality dilemmas. I wouldn't jump to conclusions that these incentives are actually driving most current training decisions, but it's an interesting area to highlight.

Ancapistani · 32d ago

There should be a distinction, but I don't think it's really clear where it is yet.

In my own usage, I tend to alternate between tiny, well-defined tasks and larger-scale, planned architectural changes or new features. Things in between those levels are hit and miss.

It also depends on what I'm building and why. If it's a quick-and-dirty script for my own use, I'll often write up - or speak - a prompt and let it do its thing in the background while I work on other things. I care much less about code quality in those instances.

codr7 · 32d ago

It's still gambling, you're trading learning/reinforcing for efficiency, which in the long run means losing skills.

bigfishrunning · 31d ago

"Vibe coders" say they're 'building a detailed coding plan, conducting "git-based experimentation with ruthless pruning", and essentially reviewing the code incrementally for correctness and conciseness', but really they're just rolling the dice and hoping everything works. The software industry has always been full of snake oil, and now there's a powerful new strain.

Animats · 32d ago

"Vibe coding" is a trend.[1]

[1] https://trends.google.com/trends/explore?geo=US&q=%22vibe%20...

parliament32 · 32d ago

This reads like "is it really gambling when I have a many-step system for predicting roulette outcomes?"

Vox_Leone · 32d ago

Noted — but honestly, that's somewhat expected. Vibe-style coding often lacks structure, patterns, and architectural discipline. That means the developer must do more heavy lifting: decide what they want, and be explicit — whether that’s 'avoid verbosity,' 'use classes,' 'encapsulate logic,' or 'handle errors properly.'

sherburt3 · 32d ago

Really makes you wonder where this is all going. What is going to be the thing where we say "Maybe we took this a little too far." I'm sure whatever bloated react apps we see today are nothing in comparison to the monstrosities we have in store for us in the future.

deadbabe · 32d ago

The future should be less bloat. We don’t need frameworks anymore, we can produce output to straight html pages with vanilla JavaScript. Could be good.

croes · 32d ago

If I do the same with a human developer instead of an AI it’s called ordering not vibe coding.

What’s the difference?

gitroom · 32d ago

[flagged]

What the Arc Browser Story Reveals About the Future of Browser Security (labs.sqrx.com)

People of Netua (blog.quineglobal.com)

Show HN: I made SEO backlink exchange platform using vector embeddings (babylovegrowth.ai)

First Steps with Logical Replication in PostgreSQL (boringsql.com)

Building a Hardened Amazon Linux 2 AMI for Secure EC2 Deployments (thehiddenport.dev)

Autonomous Vehicles in Norway – Winter Edition (blog.vfiles.no)

Isecjobs.com will be no more after August first (isecjobs.com)

Show HN: Hide Secrets Automatically in the Browser (chromewebstore.google.com)

Start your own Internet Resiliency Club (bowshock.nl)

Viral Ribosome Profiling Could Speed Vaccine Development (genengnews.com)

About State and Trends of Carbon Pricing (worldbank.org)

The CEO's Guide to Choosing the Right Tech Stack (sergiolema.dev)

07-HarmonyOS5-ObjectDetection-Case

06-HarmonyOS5-SubjectSegmentation-Case

Radicle Desktop: A Local‑First GUI for the Radicle Forge (radicle.xyz)

Show HN: Forezia – Free AI Forecasting for Small Shopify Stores

Ghostty will soon be visible to accessibility tooling on macOS (twitter.com)

Steve Jobs introduces Aqua Interface (2007) (youtube.com)

Show HN: I'm building an app to replace Overleaf and Notion

Show HN: I launched a productized service for app founders (appbakery.dev)

Getting Started Strudel (strudel.cc)

The FCC Builds a Firewall Around US-Bound Electronics (eetimes.com)

Mailto: Sam Altman – Could ChatGPT Support Threaded Side Chats?

What are Flue Gas Desulphurisation units? (thehindu.com)

Chase the Skies (minecraft.net)

Claude Code is more than just Coding (hackertarget.com)

Putting the Most Powerful LLMs to the Test: Gemini, ChatGPT, Claude and DeepSeek (medium.com)

The Illusion of the Illusion of Thinking – A Comment on Shojaee et al. (2025) (arxiv.org)

Awkward Array: library for variable data using NumPy-like idioms (awkward-array.org)

Aviation Herald: Crash: India B788 at Ahmedabad on Jun 12th 2025 (avherald.com)

BBC examining plans that could lead to US consumers paying for its journalism (theguardian.com)

Freelance SEO Help Available – Open to Opportunities

Nanonets-OCR-s – OCR model transforms documents into structured markdown (huggingface.co)

What is your experience with AI code review tools?

Show HN: How to Read Code (codedump.info)

LiveCodeBench Pro: How Olympiad Medalists Judge LLMs in Competitive Programming? (arxiv.org)

USDA Pomological Watercolors (search.nal.usda.gov)

Show HN: I built a Chrome extension that makes bug reporting dead simple (chromewebstore.google.com)

Another one for the graveyard: Google to kill Instant Apps in December (arstechnica.com)

With Quantum Entanglement, Blockchain, Can Generate Real Random Numbers (iflscience.com)

Munich from a Hamburger's Perspective (mertbulan.com)

RBAC Atlas: A curated index of rbac policies in K8s (rbac-atlas.github.io)

Working from home makes us happier (farmingdale-observer.com)

Turing Trees (far.in.net)

Chinese scientists have uncovered a deposit of 1M tons of thorium (info-culture.com)

The latest room-temperature superconductor claim debunked (quantamagazine.org)

Game Boy Advanced programming tutorial (coranac.com)

The herb linked to better memory, lower anxiety and Alzheimer's protection (the-independent.com)

The U.S. Navy is more aggressively telling startups, 'We want you' (techcrunch.com)

Advanced tool for repository analytics, statistics, including fake stars (github.com)

Perverse incentives of vibe coding

Comments (228)