I don't understand the productivity that people get out of these AI tools. I've tried it and I just can't get anything remotely worthwhile unless it's something very simple or something completely new being built from the ground up.
Like sure, I can ask claude to give me the barebones of a web service that does some simple task. Or a webpage with some information on it.
But any time I've tried to get AI services to help with bugfixing/feature development on a large, complex, potentially multi-language codebase, it's useless.
And those tasks are the ones that actually take up the majority of my time. On the occasion that I'm spinning a new thing up quickly, I don't really need an AI to do it for me -- I mean, that's the easy part!
Is there something I'm missing? Am I just not using it right? I keep seeing people talk about how addictive it is, how the productivity boost is insane, how all their code is now written by AI and then audited, and I just don't see how that's possible outside of really simple rote programming.
lukan · 35m ago
Yesterday I gave cursor a try and made my first (intentionally very lazy) vibe coding approach (a simple threejs project). It accepted the task and did things, failed, did things, failed, did things ... failed for good.
I guess I could work on the magic incantations to tweak here and there a bit until it works and I guess that's the way it is done. But I wasn't hooked.
I do get value out of LLM's for isolated broken down subtasks, where asking a LLM is quicker than googling.
For me, AI will probably become really usefull, once I can scan and integrate my own complex codebase so it gives me solutions that work there and not hallucinate API points or jump between incompatible libary versions (my main issue).
hx8 · 39m ago
Probably 80% of the time I spend coding, I'm inside a code file I haven't read in the last month. If I need to spend more than 30 seconds reading a section of code before I understand it, I'll ask AI to explain it to me. Usually, it does a good job of explaining code at a level of complexity that would take me 1-15 minutes to understand, but does a poor job of answering more complex questions or at understanding more complex code.
It's a moderately useful tool for me. I suspect the people that get the most use out of are those that would take more than 1 hour to read code I would take 10 minutes to read. Which is to say the least experienced people get the most value.
Starlevel004 · 1h ago
> Is there something I'm missing? Am I just not using it right?
The talk about it makes more sense when you remember most developers are primarily writing CRUD webapps or adware, which is essentially a solved problem already.
slurpyb · 1h ago
You are not alone! I strongly agree and I feel like I am losing my mind reading some of the comments people have about these services.
jiggawatts · 18m ago
I’ve had good experiences using it, but with the caveat that only Gemini Pro 2.5 has been at all useful, and only for “spot” tasks.
I typically use it to whip up a CLI tool or script to do something that would have been too fiddly otherwise.
While sitting in a Teams meeting I got it to use the Roslyn compiler SDK in a CLI tool that stripped a very repetitive pattern from a code base. Some OCD person had repeated the same nonsense many thousands of times. The tool cleaned up the mess in seconds.
colechristensen · 32m ago
Some people do really repetitive or really boilerplate things, others do not.
Also you have to learn to talk to it and how to ask it things.
erulabs · 1h ago
These perverse incentives run at the heart of almost all Developer Software as a Service tooling. Using someone else's hosted model incentivizes increasing token usage, but it's nothing special about AI.
Consider Database-as-a-service companies: They're not incentivized to optimize on CPU usage, they charge per cpu. They're not incentivized to improve disk compression, they charge for disk-usage. There are several DB vendors who explicitly disable disk compression and happily charge for storage capacity.
When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.
jiggawatts · 21m ago
My favourite example of this is the recent trend towards “wide events” replacing logs and metrics… spearheaded and popularised by companies that charge by the gigabytes ingested.
exiguus · 8m ago
I understand your point. The Vibe approach is IMO only effective when you adopt a software engineering mindset. Here's how it works (at least for me with Copilote agent mode):
1. Develop a Minimum Viable Product (MVP) or prototype that functions.
2. Write tests, either before or after the initial development.
3. Implement coding guidelines, style guides, linter etc. Do code reviews.
4. Continuously adjust, add features, refactor, review and expand your test suite. Iterate and let AI run tests and linters on each change
While this process may seem lengthy, it ensures reliability and efficiency. Experienced engineers might find it as quick as working solo, but the structured approach guarantees success. It feels like pairing with a inexperienced developer.
Also, this process may run you into rate limits with Copilot and might not work with your current codebase due to a lack of tests and the absence of applied coding style guides.
Additionally, it takes time. For example, for a simple to mid-level tool/feature in Go, it might take about 1 hour to develop the MVP or prototype, but another 6 to 10 hours to refine it to a quality that you might want to show to other engineers.
lubujackson · 1h ago
I feel like "vibe coding" as a "no look" sort of way to produce anything is bad and will probably remain bad for some time.
However... "vibe architecting" is likely going to be the way forward. I have had success with generating/tuning an architecture plan with AI, having it create stub files/functions then filling them out individually. I can get pretty much the whole way without typing code, but it does require a fair bit more architectural thinking than usual and a good bit of reading code (then telling the AI to "do better").
I think of it like the analogy of blind men describing an elephant when they can only feel a single part. AI is decent at high level architecture and decent at low level production but you need a human to understand the big picture and how the pieces fit (and which ones are missing).
nowittyusername · 3m ago
What you are talking about is the "proper" way of vibe coding. Most of the issues with vibe coding stem from user misunderstanding the capabilities of the technology they are using. They are overestimating the capabilities of current systems and are essentially asking for magic to happen. They don't give proper guidance, context or anything of value for the coding IDE to work with. They are relying a mindset of the 2030's to work with systems from 2025. We aint there yet folks, give as much guidance and context as you can and you will have a better time.
andy99 · 2h ago
I wish more had been written about the first assertion that using an LLM to code is like gambling and you're always hoping that just one more prompt will get you what you want.
It really captures how little control one has over the process, while simultaneously having the illusion of control.
I don't really believe that code is being made verbose to make more profits. There's probably some element of model providers not prioritizing concise code, but if conciseness while maintaining "quality" was possible is would give one model a sufficient edge over others that I suspect providers would do it.
techpineapple · 1h ago
Something I caught about Andrej Karpathy’s original tweet, was he said “give into the vibes”, and I wonder if he meant that about outcomes too.
andy99 · 56m ago
I still think the original tweet was tongue-in-cheek and not really meant to be a serious description of how to do things.
xianshou · 2h ago
Amusingly, about 90% of my rat's-nest problems with Sonnet 3.7 are solved by simply appending a few words to the end of the prompt:
"write minimum code required"
It's not even that sensitive to the wording - "be terse" or "make minimal changes" amount to the same thing - but the resulting code will often be at least 50% shorter than the un-guided version.
panstromek · 2h ago
Well, the article mentions that this reduces accuracy. Do you hit that problem often then?
chaboud · 2h ago
1. Yes. I've spent several late nights nudging Cline and Claude (and other systems) to the right answers. And being able to use AWS Bedrock to do this has been great (note: I work at Amazon).
2. I've had good fortunes keeping the agents to constrained areas, working on functions, or objects, with clearly defined (by me) boundaries. If the measure of a junior engineer is that you correct them once a day, an engineer once a week, a senior once a month, a principal once a quarter... Treat these agents like hyper-energetic interns. Nudge frequently.
3. Standard org management coding practices apply. Force the agents to show work, plan, unit test, investigate.
And, basically, I've described that we're becoming Software Development Managers with teams of on-demand low-quality interns. That's an incredibly powerful tool, but don't expect hyper-elegant and compact code from them. Keep that for the senior engineering staff (humans) for now.
(Note: The AlphaEvolve announcement makes me wonder if I'm going to have hyper-energetic applied science interns next...)
There are two sets of perverse incentives at play. The main one the author focuses on is that LLM companies are incentivized to produce verbose answers, so that when you task an LLM on extending an already verbose project, the tokens used and therefore cost increases.
The second one is more intra/interpersonal: under pressure to produce, it's very easy to rely on LLMs to get one 80% of the way there and polish the remaining 20%. I'm in a new domain that requires learning a new language. So something I've started doing is asking ChatGPT to come up with exercises / coding etudes / homework for me based on past interactions.
vanschelven · 3h ago
> Its “almost there” quality — the feeling we’re just one prompt away from the perfect solution — is what makes it so addicting. Vibe coding operates on the principle of variable-ratio reinforcement, a powerful form of operant conditioning where rewards come unpredictably. Unlike fixed rewards, this intermittent success pattern (“the code works! it’s brilliant! it just broke! wtf!”), triggers stronger dopamine responses in our brain’s reward pathways, similar to gambling behaviors.
Though I'm not a "vibe coder" myself I very much recognize this as part of the "appeal" of GenAI tools more generally. Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.
Suppafly · 1h ago
>Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.
Especially when you try to get them to generate something they explicitly tell you they won't, like nudity. It feels akin to hacking.
dingnuts · 2h ago
it's not like gambling, it is gambling. you exchange dollars for chips (tokens -- some casinos even call the chips tokens) and insert it into the machine in exchange for the chance of a prize.
if it doesn't work the first time you pull the lever, it might the second time, and it might not. Either way, the house wins.
It should be regulated as gambling, because it is. There's no metaphor, the only difference from a slot machine is that AI will never output cash directly, only the possibility of an output that could make money. So if you're lucky with your first gamble, it'll give you a second one to try.
Gambling all the way down.
NathanKP · 2h ago
This only makes sense if you have an all or nothing concept of the value of output from AI.
Every prompt and answer is contributing value toward your progress toward the final solution, even if that value is just narrowing the latent space of potential outputs by keeping track of failed paths in the context window, so that it can avoid that path in a future answer after you provide followup feedback.
The vast majority of slot machine pulls produce no value to the player. Every single prompt into an LLM tool produces some form of value. I have never once had an entirely wasted prompt unless you count the AI service literally crashing and returning a "Service Unavailable" type error.
One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.
NegativeLatency · 2h ago
> Every prompt and answer is contributing value toward your progress toward the final solution
This has not been my experience, maybe sometimes, but certainly not always.
As an example: asking chatgpt/gemini about how to accomplish some sql data transformation set me back in finding the right answer because the answer it did give me was so plausible but also super duper not correct in the end. Would've been better off not using it in that case.
Brings to mind "You can't build a ladder to the moon"
secabeen · 2h ago
> One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.
That assumes that the value of a solution is linear with the amount completed. If the Pareto Principle holds (80% of effects come from 20% of causes), then not getting that critical 10+% likely has an outsized effect on the value of the solution. If I have to do the 20% of the work that's hard and important after taking what the LLM did for the remainder, I haven't gained as much because I still have to build the state machine in my head to understand the problem-space well enough to do that coding.
PaulDavisThe1st · 1h ago
This assumes you can easily and reliably identify the 10% you need to fix.
rapind · 1h ago
By this logic:
- I buy stock that doesn't perform how I expected.
- I hire someone to produce art.
- I pay a lawyer to represent me in court.
- I pay a registration fee to play a sport expecting to win.
- I buy a gift for someone expecting friendship.
Are all gambas.
You aren't paying for the result (the win), you are paying for the service that may produce the desired result, and in some cases one of may possibly desirable results.
rjbwork · 1h ago
>I buy stock that doesn't perform how I expected.
Hence the adage "sir, this is a casino"
nkrisc · 1h ago
None of those are a games of chance, except the first.
Suppafly · 1h ago
>None of those are a games of chance, except the first.
Neither is GenAI, the grandparent comment is dumb.
princealiiiii · 2h ago
> It should be regulated as gambling, because it is.
That's wild. Anything with non-deterministic output will have this.
kagevf · 2h ago
> "Anything with non-deterministic output will have this.
Anything with non-deterministic output that charges money ...
Edit
Added words to clarify what I meant.
GuinansEyebrows · 2h ago
i think at least a lot of things (if not most things) that i pay for have an agreed-upon result in exchange for payment, and a mitigation system that'll help me get what i paid for in the event that something else prevents that from happening. if you pay for something and you don't know what you're going to get, and you have to keep paying for it in the hopes that you get what you want out of it... that sounds a lot like gambling. not exactly, but like.
0cf8612b2e1e · 2h ago
If I ask an artist to draw a picture, I still have to pay for the service, even if I am unhappy without the result.
cogman10 · 1h ago
In the US? No, you actually do not need to pay for the service if you deem the quality of the output to be substandard. In particular with art, it's pretty standard to put in a non-refundable downpayment with the final payment due on delivery.
You only lose those rights in the contracts you sign (which, in terms of GPT, you've likely clicked through a T&C which waves all right to dispute or reclaim payment).
If you ask an artist to draw a picture and decide it's crap, you can refuse to take it and to pay for it. They won't be too happy about it, but they'll own the picture and can sell it on the market.
0cf8612b2e1e · 53m ago
There must be artists working on an hourly contract rate.
Maybe art is special, but there are other professions where someone can invest heaps of time and effort without delivering the expected result. A trial attorney, treasure hunter, oil prospector, app developer. All require payment for hours of service, regardless of outcome.
cogman10 · 35m ago
It'll mostly depend on the contract you sign with these services and the state you live in.
When it comes to work that requires craftmanship it's pretty common to be able to not pay them if they do a poor job. It may cost you more than you paid them to fix their mistake, but you can generally reclaim your money you paid them if the work they did was egregiously poor.
nkrisc · 1h ago
Sounds like you should negotiate a better contract next time, such as one that allows for revisions.
No comments yet
martin-t · 2h ago
That's incorrect, gambling is about waiting.
Brain scans have revealed that waiting for a potential win stimulates the same areas as the win itself. That's the "appeal" of gambling. Your brain literally feels like it's winning while waiting because it _might_ win.
GuinansEyebrows · 2h ago
maybe more accurately anything with non-deterministic output that you have to pay-per-use instead of paying by outcome.
Suppafly · 1h ago
>that you have to pay-per-use instead of paying by outcome.
That's still not gambling and it's silly to pretend it is. It feels like gambling but that's it.
csallen · 1h ago
Books are not like gambling, they are gambling. you exchange dollars for chips (money — some libraries even give you digital credits for "tokens") and spend them on a book in exchange for the chance of getting something good out of it.
If you don't get something good the first time you buy a book, you might with the next book, or you might not. Either way, the house wins.
It should be regulated as gambling, because it is. There's no metaphor — the only difference from a slot machine is that books will never output cash directly, only the possibility of an insight or idea that could make money. So if you're lucky with your first gamble, you'll want to try another.
Gambling all the way down.
squeaky-clean · 2h ago
So how exactly does that work for the $25/mo flat fee that I pay OpenAI for chatgpt. They want me to keep getting the wrong output and burning money on their backend without any additional payment from me?
dwringer · 2h ago
Something of an aside, but this is sort of equivalent to asking "how does that work for the $50 dollars the casino gave me to gamble with for free"? I once made 50 dollars exactly in that way by taking the casino's free tokens and putting them all on black in a single roulette spin. People like that are not the ones companies like that make money off of.
kimixa · 2h ago
For the amount of money OpenAI burns that $25/mo is functionally the same as zero - they're still in the "first one is free" phase.
Though you could say the same thing about pretty much any VC funded sector in the "Growth" phase. And I probably will.
AlexCoventry · 1h ago
Is it really gambling, if the house always loses? :-)
abletonlive · 1h ago
Yikes. The reactionary reach for more regulation from a certain group is just so tiresome. This is the real mind virus that I wish would be contained in Europe.
I almost can't believe this idea is being seriously considered by anybody. By that logic buying any CPU is gambling because it's not deterministic how far you can overclock it.
Just so you know, not every llm use case requires paying for tokens. You can even run a local LLM and use cline w/ it for all your coding needs. Pull that slot machine lever as many times as you like without spending a dollar.
slurpyb · 48m ago
Do you understand what electricity is?
mystified5016 · 2h ago
I run genAI models on my own hardware for free. How does that fit into your argument?
codr7 · 2h ago
The fact that you can get your drugs for free doesn't exactly make you less of an addict.
latentsea · 2h ago
I used to run GenAI image generators on my own hardware, and I 200% agree with your stance. Literally wound up selling my RTX 4090 to get the dealer to move out of the house. I'm better off now, but can't ever really own a GPU again without opening myself back up to that. Sigh...
squeaky-clean · 2h ago
It does literally make it not gambling though, which is what's betting discussed.
It also kind of breaks the whole argument that they're designed to be addictive in order to make you spend more on tokens.
codr7 · 47m ago
As long as that argument makes you happy, go for it :)
yewW0tm8 · 2h ago
Same with anything though? Startups, marriages, kids.
All those laid off coders gambled on a career that didn’t pan out.
Want more certainty in life, gonna have to get political.
And even then there is no guarantee the future give a crap. Society may well collapse in 30 years, or 100…
This is all just role play to satisfy the prior generations story driven illusions.
comex · 3h ago
> There was no standardization of parts in the probe. Two widgets intended to do almost the same job could be subtly different or wildly different. Braces and mountings seemed hand carved. The probe was as much a sculpture as a machine.
> Blaine read that, shook his head, and called Sally. Presently she joined him in his cabin.
> “Yes, I wrote that," she said. "It seems to be true. Every nut and bolt in that probe was designed separately. It's less surprising if you think of the probe as having a religious purpose. But that's not all. You know how redundancy works?"
> “In machines? Two gilkickies to do one job. In case one fails."
> “Well, it seems that the Moties work it both ways."
> “Moties?"
> She shrugged. "We had to call them something. The Mote engineers made two widgets do one job, all right, but the second widget does two other jobs, and some of the supports are also bimetallic thermostats and thermoelectric generators all in one. Rod, I barely understand the words. Modules: human engineers work in modules, don't they?"
> “For a complicated job, of course they do."
> “The Moties don't. It's all one piece, everything working on everything else. Rod, there's a fair chance the Moties are brighter than we are."
- The Mote in God's Eye, Larry Niven and Jerry Pournelle (1974)
[…too bad that today's LLMs are not brighter than we are, at least when it comes to writing correct code…]
mnky9800n · 3h ago
That book is very much fun and also I never understood why Larry Niven is so obsessed with techno feudalism and gender roles. I think this is my favourite book but I think his best book is maybe Ringworld.
AlexCoventry · 1h ago
The zero-sum mentality which leads people to think that way is already clear in The Mote In God's Eye. I think the point of the book is that despite being superior to humans in every way imaginable, the Moties are condemned to repeated violent conflict by Malthusian pressures, because they have nowhere to expand. One way I interpret the "mote" in God's eye is the authors' belief that no matter how good we get, we'll always be in potentially violent conflict with each other for limited resources. (The "beam" in our own eye is then that we're still fighting each other over less pressing concerns. :-)
Loughla · 2h ago
Ringworld is a great book. The later books have great concepts, but could do without so much. . . rishing. Niven plainly inserted his furry porn fetish into those books, for reasons unclear to any human alive.
Suppafly · 1h ago
>for reasons unclear to any human alive
Given how prevalent furries seem to be, especially in nerd adjacent culture, I'd say he was ahead of his time.
Suppafly · 1h ago
>I think this is my favourite book but I think his best book is maybe Ringworld.
Ringworld is pretty good, the multiples sequels get kind of out there.
mnky9800n · 1h ago
I never read any of the sequels just a couple of the short story collections and some of the man kzin wars. What’s wild about them?
jerf · 3h ago
Yeah, I've had that thought too.
I think a lot about Motie engineering versus human engineering. Could Motie engineering be practical? Is human engineering a fundamentally good idea, or is it just a reflection of our working memory of 7 +/- 2? Biology is Motie-esque, but it's pretty obvious we are nowhere near a technology level that could ever bring a biological system up from scratch.
If Motie engineering is a good idea, it's not a smooth gradient. The Motie-est code I've seen is also the worst. It is definitely not the case that getting a bit more Motie-esque, all else being equal, produces better results. Is there some crossover point where it gets better and maybe passes our modular designs? If AIs do get better than us at coding, and it turns out they do settle on Motie-esque coding, no human will ever be able to penetrate it ever again. We'd have to instruct our AI coders to deliberately cripple themselves to stay comprehensible, and that is... economically a tricky proposition.
After all, anyone can write anything into a novel they want to and make anything work. It's why I've generally stopped reading fiction that is explicitly meant to make ideological or political points to the exclusion of all else; anything can work on a page. Does Motie engineering correspond to anything that could be manifested practically in reality?
Will the AIs be better at modularization than any human? Will they actually manifest the Great OO Promise of vast piles of amazingly well-crafted, re-usable code once they mature? Or will the optimal solution turn out to be bespoke, locally-optimized versions of everything everywhere, and the solution to combining two systems is to do whatever locally-sensible customizations are called for?
(I speak of the final, mature version, however long that may be. Today LLMs are kind of the worst of both worlds. That turns out to be a big step up from "couldn't play in this space at all", so I'm not trying to fashionably slag on AIs here. I'm more saying that the one point we have is not yet enough to draw so much as a line through, let alone an entire multi-dimensional design methodology utility landscape.)
I didn't expect to live to see the answers, but maybe I will.
rcxdude · 19m ago
>I think a lot about Motie engineering versus human engineering. Could Motie engineering be practical? Is human engineering a fundamentally good idea, or is it just a reflection of our working memory of 7 +/- 2? Biology is Motie-esque, but it's pretty obvious we are nowhere near a technology level that could ever bring a biological system up from scratch.
It's the kind of thing you commonly get if you let an unconstrained optimization process run for long enough. It will generally be better, according to whatever function you're optimizing for. The main disadvantage, apart from being hard to understand or modify the design, is manufacturing and repair (needing to make many different parts), but if you have sufficiently good manufacturing technology (e.g. atomic level printers), then that may be a non-issue. And in software it's already feasible: you can see very small scale versions of this in extremely resource-constrained environments where it's worthwhile really trying to optimize things (see some demoscene entries), but it's pretty rare (some tricks that optimizing compilers pull off are similar, but they are generally very local).
> it might be difficult for AI companies to prioritize code conciseness when their revenue depends on token count.
Would open source, local models keep pressure on AI companies to prioritize the usable code, as code quality and engineering time saved are critical to build vs buy discussions?
jsheard · 3h ago
Depends if open source models can remain relevant once the status quo of "company burns a bunch of VC money to train a model, open sources it, and generates little if any revenue" runs out of steam. That's obviously not sustainable long term.
Larrikin · 3h ago
Maybe we will get some university backed SETI like projects to replace all those personal mining rigs now that that hype is finally fading.
samtp · 2h ago
I've pretty clearly seen the critical thinking ability of coworkers who depend on AI too much sharply decline over the past year. Instead of taking 30 seconds to break down the problem and work through assumptions, they immediately copy/paste into an LLM and spit back what it tells them.
This has lead to their abilities stalling while their output seemingly goes up. But when you look at the quality of their output, and their ability to get projects over the last 10% or make adjustments to an already completed project without breaking things, it's pretty horrendous.
Etheryte · 2h ago
My observations align with this pretty closely. I have a number of colleagues who I wager are largely using LLM-s, both by changes in coding style and how much they suddenly add comments, and I can't help but feel a noticeable drop in the quality of the output. Issues that should clearly have no business making it to code review are now regularly left for others to catch, it often feels like they don't even look at their own diffs. What to make of it, I'm not entirely sure. I do think there are ways LLM-s can help us work in better ways, but they can also lead to considerably worse outcomes.
jimbokun · 2h ago
Just replace your colleagues with the LLMs they are using. You will reduce costs with no decrease in the quality of work.
jobs_throwaway · 1h ago
As someone who vibe codes at times (and is a professional programmer), I'm curious how yall go about resisting this? Just avoid LLMs entirely and do everything by hand? Very rigorously go over any LLM-generated code before committing?
It certainly is hard when I'm say writing unit tests to avoid the temptation to throw it into Cursor and prompt until it works.
breckenedge · 1h ago
Set a budget. Get rate limited. Let the experience remind you how much time you’re actually wasting letting the model write good looking but buggy code, versus just writing code responsibly.
andy99 · 2h ago
I think lack of critical thinking is the root cause, not a symptom. I think pretty much everyone uses LLMs these days, but you can tell who sees the output and considers it "done" vs who uses LLM output as an input to their own process.
mystified5016 · 2h ago
I mean, I can tell that I'm having this problem and my critical thinking skills are otherwise typically quite sharp.
At work I've inherited a Kotlin project and I've never touched Kotlin or android before, though I'm an experienced programmer in other domains. ChatGPT has been guiding me through what needs to be done. The problem I'm having is that it's just too damn easy to follow its advice without checking. I might save a few minutes over reading the docs myself, but I don't get the context the docs would have given me.
I'm a 'Real Programmer' and I can tell that the code is logically sound and self-consistent. The code works and it's usually rewritten so much as to be distinctly my code and style. But still it's largely magical. If I'm doing things the less-correct way, I wouldn't really know because this whole process has led me to some pretty lazy thinking.
On the other hand, I very much do not care about this project. I'm very sure that it will be used just a few times and never see the light of day again. I don't expect to ever do android development again after this, either. I think lazy thinking and farming the involved thinking out to ChatGPT is acceptable here, but it's clear how easily this could become a very bad habit.
I am making a modest effort to understand what I'm doing. I'm also completely rewriting or ignoring the code the AI gives me, it's more of an API reference and example. I can definitely see how a less-seasoned programmer might get suckered into blindly accepting AI code and iterating prompts until the code works. It's pretty scary to think about how the coming generations of programmers are going to experience and conceptualize programming.
Workaccount2 · 3h ago
Are using the APIs worth the extra cost vs using the web tools? I haven't used any API tools, I am not a programmer, but I have generated many millions of tokens in the web canvas, something that would cost way more than the $20 I spend for them.
thimabi · 29m ago
I think the idea that LLMs are incentivized to write verbose code fails when one considers non-API usage.
Like you, I’ve accumulated tons of LLM usage via apps and web apps. I can actually see how the models are much more succinct there compared to the API interface.
My uneducated guess is that LLM models try to fit their responses into the “output tokens” limit, which is surely much lower in UIs than what can be set in pay-as-you-go interfaces.
jfim · 2h ago
If you're using Claude code or cursor, for example, they can read files automatically instead of needing the user to copy paste back and forth.
Both can generate code though, I've generated code using the web interface and it works, it's just a bit tedious to copy back and forth.
tippytippytango · 3h ago
This article captures a lot of the problem. It’s often frustrating how it tries to work around really simple issues with complex workarounds that don’t work at all. I tell it the secret simple thing it’s missing and it gets it. It always makes me think, god help the vibe coders that can’t read code. I actually feel bad for them.
grufkork · 2h ago
Working as an instructor for a project course for first-year university students, I have run in to this a couple of times. The code required for the project is pretty simple, but there are a couple of subtle details that can go wrong. Had one group today with bit shifts and other "advanced" operators everywhere, but the code was not working as expected. I asked them to just `Serial.println()` so they could check what was going on, and they were stumped. LLMs are already great tools, but if you don't know basic troubleshooting/debugging you're in for a bad time when the brick wall arrives.
On the other hand, it shows how much coding is just repetition. You don't need to be a good coder to perform serviceable work, but you won't create anything new and amazing either, if you don't learn to think and reason - but that might for some purposes be fine. (Worrying for the ability of the general population however)
You could ask whether these students would have gotten anything done without generated code? Probably, it's just a momentarily easier alternative to actual understanding. They did however realise the problem and decided by themselves to write their own code in a simpler, more repetitive and "stupid" style, but one that they could reason about. So hopefully a good lesson and all well in the end!
tippytippytango · 22m ago
Sounds like you found a good problem for the students. Having the experience of failing to get the right answer out of the tool and then succeeding on your whits creates an opportunity to learn these tools benefit from disciplined usage.
iotku · 3h ago
There's a pretty big gap between "make it work" and "make it good".
I've found with LLMs I can usually convince them to get me at least something that mostly works, but each step compounds with excessive amounts of extra code, extraneous comments ("This loop goes through each..."), and redundant functions.
In the short term it feels good to achieve something 'quickly', but there's a lot of debt associated with running a random number generator on your codebase.
didgetmaster · 2h ago
In my opinion, the difference between good code and code that simply works (sometimes barely); is that good code will still work (or error out gracefully) when the state and the inputs are not as expected.
Good programs are written by people who anticipate what might go wrong. If the document says 'don't do X'; they know a tester is likely to try X because a user will eventually do it.
r053bud · 3h ago
I fear that’s going to end up being a significant portion of engineers in the future.
babyent · 3h ago
I think we are in the Flash era again lol.
You remember those days right? All those Flash sites.
martin-t · 2h ago
> I tell it the secret simple thing it’s missing and it gets it.
Anthropomorphizing LLMs is not helpful. It doesn't get anything, you just gave it new tokens, ones which are more closely correlated with the correct answer. It also generates responses similar to what a human would say in the same situation.
Note i first wrote "it also mimicks what a human would say", then I realized I am anthropomorphizing a statistical algorithm and had to correct myself. It's hard sometimes but language shapes how we think (which is ironically why LLMs are a thing at all) and using terms which better describe how it really works is important.
ben_w · 2h ago
Given that LLMs are trained on humans, who don't respond well to being dehumanised, I expect anthropomorphising them to be better than the opposite of that.
It's a feature of language to describe things in those terms even if they aren't accurate.
>using terms which better describe how it really works is important
Sometimes, especially if you doing something where that matters, but abstracting those details away is also useful when trying to communicate clearly in other contexts.
tippytippytango · 54m ago
Patronizing much?
ramoz · 1h ago
I disagree with the idea that LLM providers are deliberately designing solutions to consume more tokens. We're in the early days of agentic coding, and the landscape is intensely competitive. Providers are focused on building highly capable systems to drive adoption, especially with open-source alternatives just a git clone away.
Yes, Claude Code can be token-heavy, but that's often a trade-off for their current level of capability compared to other options. Additionally, Claude Code has built-in levers for cost (I prefer they continue to focus on advanced capability, let pricing accessibility catch up).
"early days" means:
- Prompt engineering is still very much a required skill for better code and lower pricing
- Same with still needing to be an engineer for the same reasons, and:
- Devs need to actively guide these agents. This includes detailed planning, progress tracking, and careful context management – which, as the author notes, is more involved than many realize. I've personally found success using Gemini to create structured plans for Claude Code to execute, which helps manage its verbosity and focus to "thoughtful" execution (as guided by gemini). I drop entire codebases into Gemini (for free).
slurpyb · 49m ago
It’s so cool that we’re all actively participating in the handover of all our work to these massive companies so we can be forever reliant on their blackbox subscriptions. Don’t fret; there will be a day where those profit numbers will have to go up and they will consciously make the product worse, just to trigger more queries, and thus extract more money from you. Gross.
mecredis · 1h ago
Hi! Author here. I don't actually think they're deliberately doing this, hence my choice of "perverse incentives" vs. something more accusatory. The issue is that they don't have a ton of incentive to fix it.
Agree with you on all the rest, and I think writing a post like this was very much intended as a gut-check on things since the early days are hopefully the times when things can get fixed up.
ramoz · 46m ago
My speculation is that these companies have significant reason to prioritize lowering the amount of tokens produced as well as cost of tokens.
The leaked Claude Code codebase was riddled with "concise", "do not add comments", "mimic codestyle", even an explicit "You should minimize output tokens as much as possible" etc. Btw, Claude Code uses a custom system prompt, not the leaked 24k claude.ai one.
coolcase · 1h ago
Dopamine? That sort of thing triggers cortisol for me if anything!
neilv · 2h ago
I would seriously consider banning "vibe coding" right now, because:
1. Poor solutions.
2. Solutions not understood by the person who prompted them.
3. Development team being made dumber.
4. Legal and ethical concerns about laundering open source copyrights.
5. I'm suspicious of the name "vibe coding", like someone is intentionally marketing it to people who don't care to be good at their jobs.
6. I only want to hire people who can do holistically better work than current "AI". (Not churn code for a growth startup's Potemkin Village, nor to only nominally satisfy a client's requirements while shipping them piles of counterproductive garbage.)
7. Publicizing that you are a no-AI-slop company might scare away the majority of the bad prospective employees, while disproportionately attracting the especially good ones. (Not that everyone who uses "AI" is bad, but they've put themselves in the bucket with all the people who are bad, and that's a vastly better filter for the art of hiring than whether someone has spent months memorizing LeetCode answers solely for interviews.)
gitroom · 3h ago
man, pricing everywhere is getting nuts. makes me wonder if most stuff just gets harder to use over time or im just old now - you ever hit a point where you stop caring about new tools because it feels like too much work?
Vox_Leone · 2h ago
Noted — but honestly, that's somewhat expected. Vibe-style coding often lacks structure, patterns, and architectural discipline. That means the developer must do more heavy lifting: decide what they want, and be explicit — whether that’s 'avoid verbosity,' 'use classes,' 'encapsulate logic,' or 'handle errors properly.'
UncleOxidant · 49m ago
> I have probably spent over $1,000 vibe coding various projects into reality
dude, you can use Gemini Pro 2.5 with Cline - it's free and is rated at least as good as Claude Sonnet 3.7 right now.
sigmaisaletter · 3h ago
In section 4, the author writes "... cheaper than Claude 3.7 ($0.80 per token vs. $3)".
This is an obvious mistake, the price is per Megatoken, not per token.
Really makes you wonder where this is all going. What is going to be the thing where we say "Maybe we took this a little too far." I'm sure whatever bloated react apps we see today are nothing in comparison to the monstrosities we have in store for us in the future.
deadbabe · 1h ago
The future should be less bloat. We don’t need frameworks anymore, we can produce output to straight html pages with vanilla JavaScript. Could be good.
andrewstuart · 2h ago
Claude was last week.
The author should try Gemini it’s much better.
martin-t · 2h ago
Honestly can't tell if satire or not.
jazoom · 2h ago
It's not satire. Gemini is much better for coding, at least for me.
Just to illustrate, I asked both about a browser automation script this morning. Claude used Selenium. Gemini used Playwright.
I think the main reasons Gemini is much better are:
1. It gets my whole code base as context. Claude can't take that many tokens. I also include documentation for newer versions of libraries (e.g. Svelte 5) that the LLM is not so familiar with.
2. Gemini has a more recent knowledge cutoff.
3. Gemini 2.5 Pro is a thinking model.
4. It's free to use through the web UI.
croes · 1h ago
If I do the same with a human developer instead of an AI it’s called ordering not vibe coding.
What’s the difference?
charcircuit · 2h ago
This article ignores the enormous demand of AI coding paired with competition between providers. Reducing the price of tokens means that people can afford to generate more tokens. A code provider being cheaper on average to operate than another is a competitive advantage.
johnea · 1h ago
I generally agree with the concerns of this article, and wonder about the theory of the LLM having a innate inclination to generate bloated code.
Even in this article though, I feel like there is a lot of anthropomorphization of LLMs.
> LLMs and their limitations when reasoning about abstract logic problems
As I understand them, LLMs don't "reason" about anything. It's purely a statistical sequencing of words (or other tokens) as determined by the training set and the prompt. Please correct me if I'm wrong.
Also, regarding this theory that the models may be biased to produce bloated code: I've reposted this once already, and no one has replied yet, and I still wonder:
----------
To me, this represents one of the most serious issues with LLM tools: the opacity of the model itself. The code (if provided) can be audited for issues, but the model, even if examined, is an opaque statistical amalgamation of everything it was trained on.
There is no way (that I've read of) for identifying biases, or intentional manipulations of the model that would cause the tool to yield certain intended results.
There are examples of DeepState generating results that refuse to acknowledge Tienanmen square, etc. These serve as examples of how the generated output can intentionally be biased, without the ability to readily predict this general class of bias by analyzing the model data.
----------
I'm still looking for confirmation or denial on both of these questions...
Is it really vibe coding if you are building a detailed coding plan, conducting "git-based experimentation with ruthless pruning", and essentially reviewing the code incrementally for correctness and conciseness? Sure, it's a process dependent on AI, but it's very far from nearly "forget[ing] that the code even exists".
That all said, I do think the article captures some of the current cost/quality dilemmas. I wouldn't jump to conclusions that these incentives are actually driving most current training decisions, but it's an interesting area to highlight.
Ancapistani · 2h ago
There should be a distinction, but I don't think it's really clear where it is yet.
In my own usage, I tend to alternate between tiny, well-defined tasks and larger-scale, planned architectural changes or new features. Things in between those levels are hit and miss.
It also depends on what I'm building and why. If it's a quick-and-dirty script for my own use, I'll often write up - or speak - a prompt and let it do its thing in the background while I work on other things. I care much less about code quality in those instances.
codr7 · 2h ago
It's still gambling, you're trading learning/reinforcing for efficiency, which in the long run means losing skills.
This reads like "is it really gambling when I have a many-step system for predicting roulette outcomes?"
Pxtl · 2h ago
I can feel how the extreme autocomplete of AI is a drug.
Half of my job is fighting the "copy/paste/change one thing" garbage that developers generate. Keeping code DRY. The autocompletes do an amazing job of automating the repeated boilerplate. "Oh you're doing this little snippet for the first and second property? Obviously you want to do that for every property! Let me just expand that out for you!"
And I'm like "oooh, that's nice and convenient".
...
But I also should be looking at that with the stink-eye... part of that code is now duplicated a dozen times. Is there any way to reduce that duplication to the bare minimum? At least so it's only one duplicated declaration or call and all of the rest is per-thingy?
Or any way to directly/automatically wrap the thing without going property-by-property?
Normally I'd be asking myself these questions by the 3rd line. But this just made a dozen of those in an instant. And it's so tempting and addictive to just say "this is fine" and move on.
That kind of code is not fine.
Suppafly · 1h ago
>That kind of code is not fine.
Depends on your definition of fine. Is it less readable because it's doing the straight forward thing several times instead of wrapping it into a loop or a method, or is it more readable because of that.
Is it not fine because it's slower, or does it all just compile down to the same thing anyway?
Or is it not fine because you actually should be doing different things for the different properties but assumed you don't because you let the AI do the thinking for you?
Ancapistani · 2h ago
> That kind of code is not fine.
I agree, but I'm also challenging that position within myself.
Why isn't it OK? If your primary concern is readability, then perhaps LLMs can better understand generated code relative to clean, human-readable code. Also, if you're not directly interacting with it, who cares?
As for duplication introducing inconsistencies, that's another issue entirely :)
Like sure, I can ask claude to give me the barebones of a web service that does some simple task. Or a webpage with some information on it.
But any time I've tried to get AI services to help with bugfixing/feature development on a large, complex, potentially multi-language codebase, it's useless.
And those tasks are the ones that actually take up the majority of my time. On the occasion that I'm spinning a new thing up quickly, I don't really need an AI to do it for me -- I mean, that's the easy part!
Is there something I'm missing? Am I just not using it right? I keep seeing people talk about how addictive it is, how the productivity boost is insane, how all their code is now written by AI and then audited, and I just don't see how that's possible outside of really simple rote programming.
I guess I could work on the magic incantations to tweak here and there a bit until it works and I guess that's the way it is done. But I wasn't hooked.
I do get value out of LLM's for isolated broken down subtasks, where asking a LLM is quicker than googling.
For me, AI will probably become really usefull, once I can scan and integrate my own complex codebase so it gives me solutions that work there and not hallucinate API points or jump between incompatible libary versions (my main issue).
It's a moderately useful tool for me. I suspect the people that get the most use out of are those that would take more than 1 hour to read code I would take 10 minutes to read. Which is to say the least experienced people get the most value.
The talk about it makes more sense when you remember most developers are primarily writing CRUD webapps or adware, which is essentially a solved problem already.
I typically use it to whip up a CLI tool or script to do something that would have been too fiddly otherwise.
While sitting in a Teams meeting I got it to use the Roslyn compiler SDK in a CLI tool that stripped a very repetitive pattern from a code base. Some OCD person had repeated the same nonsense many thousands of times. The tool cleaned up the mess in seconds.
Also you have to learn to talk to it and how to ask it things.
Consider Database-as-a-service companies: They're not incentivized to optimize on CPU usage, they charge per cpu. They're not incentivized to improve disk compression, they charge for disk-usage. There are several DB vendors who explicitly disable disk compression and happily charge for storage capacity.
When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.
1. Develop a Minimum Viable Product (MVP) or prototype that functions.
2. Write tests, either before or after the initial development.
3. Implement coding guidelines, style guides, linter etc. Do code reviews.
4. Continuously adjust, add features, refactor, review and expand your test suite. Iterate and let AI run tests and linters on each change
While this process may seem lengthy, it ensures reliability and efficiency. Experienced engineers might find it as quick as working solo, but the structured approach guarantees success. It feels like pairing with a inexperienced developer.
Also, this process may run you into rate limits with Copilot and might not work with your current codebase due to a lack of tests and the absence of applied coding style guides.
Additionally, it takes time. For example, for a simple to mid-level tool/feature in Go, it might take about 1 hour to develop the MVP or prototype, but another 6 to 10 hours to refine it to a quality that you might want to show to other engineers.
However... "vibe architecting" is likely going to be the way forward. I have had success with generating/tuning an architecture plan with AI, having it create stub files/functions then filling them out individually. I can get pretty much the whole way without typing code, but it does require a fair bit more architectural thinking than usual and a good bit of reading code (then telling the AI to "do better").
I think of it like the analogy of blind men describing an elephant when they can only feel a single part. AI is decent at high level architecture and decent at low level production but you need a human to understand the big picture and how the pieces fit (and which ones are missing).
It really captures how little control one has over the process, while simultaneously having the illusion of control.
I don't really believe that code is being made verbose to make more profits. There's probably some element of model providers not prioritizing concise code, but if conciseness while maintaining "quality" was possible is would give one model a sufficient edge over others that I suspect providers would do it.
"write minimum code required"
It's not even that sensitive to the wording - "be terse" or "make minimal changes" amount to the same thing - but the resulting code will often be at least 50% shorter than the un-guided version.
2. I've had good fortunes keeping the agents to constrained areas, working on functions, or objects, with clearly defined (by me) boundaries. If the measure of a junior engineer is that you correct them once a day, an engineer once a week, a senior once a month, a principal once a quarter... Treat these agents like hyper-energetic interns. Nudge frequently.
3. Standard org management coding practices apply. Force the agents to show work, plan, unit test, investigate.
And, basically, I've described that we're becoming Software Development Managers with teams of on-demand low-quality interns. That's an incredibly powerful tool, but don't expect hyper-elegant and compact code from them. Keep that for the senior engineering staff (humans) for now.
(Note: The AlphaEvolve announcement makes me wonder if I'm going to have hyper-energetic applied science interns next...)
The second one is more intra/interpersonal: under pressure to produce, it's very easy to rely on LLMs to get one 80% of the way there and polish the remaining 20%. I'm in a new domain that requires learning a new language. So something I've started doing is asking ChatGPT to come up with exercises / coding etudes / homework for me based on past interactions.
Though I'm not a "vibe coder" myself I very much recognize this as part of the "appeal" of GenAI tools more generally. Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.
Especially when you try to get them to generate something they explicitly tell you they won't, like nudity. It feels akin to hacking.
if it doesn't work the first time you pull the lever, it might the second time, and it might not. Either way, the house wins.
It should be regulated as gambling, because it is. There's no metaphor, the only difference from a slot machine is that AI will never output cash directly, only the possibility of an output that could make money. So if you're lucky with your first gamble, it'll give you a second one to try.
Gambling all the way down.
Every prompt and answer is contributing value toward your progress toward the final solution, even if that value is just narrowing the latent space of potential outputs by keeping track of failed paths in the context window, so that it can avoid that path in a future answer after you provide followup feedback.
The vast majority of slot machine pulls produce no value to the player. Every single prompt into an LLM tool produces some form of value. I have never once had an entirely wasted prompt unless you count the AI service literally crashing and returning a "Service Unavailable" type error.
One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.
This has not been my experience, maybe sometimes, but certainly not always.
As an example: asking chatgpt/gemini about how to accomplish some sql data transformation set me back in finding the right answer because the answer it did give me was so plausible but also super duper not correct in the end. Would've been better off not using it in that case.
Brings to mind "You can't build a ladder to the moon"
That assumes that the value of a solution is linear with the amount completed. If the Pareto Principle holds (80% of effects come from 20% of causes), then not getting that critical 10+% likely has an outsized effect on the value of the solution. If I have to do the 20% of the work that's hard and important after taking what the LLM did for the remainder, I haven't gained as much because I still have to build the state machine in my head to understand the problem-space well enough to do that coding.
- I buy stock that doesn't perform how I expected.
- I hire someone to produce art.
- I pay a lawyer to represent me in court.
- I pay a registration fee to play a sport expecting to win.
- I buy a gift for someone expecting friendship.
Are all gambas.
You aren't paying for the result (the win), you are paying for the service that may produce the desired result, and in some cases one of may possibly desirable results.
Hence the adage "sir, this is a casino"
Neither is GenAI, the grandparent comment is dumb.
That's wild. Anything with non-deterministic output will have this.
Anything with non-deterministic output that charges money ...
Edit Added words to clarify what I meant.
You only lose those rights in the contracts you sign (which, in terms of GPT, you've likely clicked through a T&C which waves all right to dispute or reclaim payment).
If you ask an artist to draw a picture and decide it's crap, you can refuse to take it and to pay for it. They won't be too happy about it, but they'll own the picture and can sell it on the market.
Maybe art is special, but there are other professions where someone can invest heaps of time and effort without delivering the expected result. A trial attorney, treasure hunter, oil prospector, app developer. All require payment for hours of service, regardless of outcome.
When it comes to work that requires craftmanship it's pretty common to be able to not pay them if they do a poor job. It may cost you more than you paid them to fix their mistake, but you can generally reclaim your money you paid them if the work they did was egregiously poor.
No comments yet
Brain scans have revealed that waiting for a potential win stimulates the same areas as the win itself. That's the "appeal" of gambling. Your brain literally feels like it's winning while waiting because it _might_ win.
That's still not gambling and it's silly to pretend it is. It feels like gambling but that's it.
If you don't get something good the first time you buy a book, you might with the next book, or you might not. Either way, the house wins.
It should be regulated as gambling, because it is. There's no metaphor — the only difference from a slot machine is that books will never output cash directly, only the possibility of an insight or idea that could make money. So if you're lucky with your first gamble, you'll want to try another.
Gambling all the way down.
Though you could say the same thing about pretty much any VC funded sector in the "Growth" phase. And I probably will.
I almost can't believe this idea is being seriously considered by anybody. By that logic buying any CPU is gambling because it's not deterministic how far you can overclock it.
Just so you know, not every llm use case requires paying for tokens. You can even run a local LLM and use cline w/ it for all your coding needs. Pull that slot machine lever as many times as you like without spending a dollar.
It also kind of breaks the whole argument that they're designed to be addictive in order to make you spend more on tokens.
All those laid off coders gambled on a career that didn’t pan out.
Want more certainty in life, gonna have to get political.
And even then there is no guarantee the future give a crap. Society may well collapse in 30 years, or 100…
This is all just role play to satisfy the prior generations story driven illusions.
> Blaine read that, shook his head, and called Sally. Presently she joined him in his cabin.
> “Yes, I wrote that," she said. "It seems to be true. Every nut and bolt in that probe was designed separately. It's less surprising if you think of the probe as having a religious purpose. But that's not all. You know how redundancy works?"
> “In machines? Two gilkickies to do one job. In case one fails."
> “Well, it seems that the Moties work it both ways."
> “Moties?"
> She shrugged. "We had to call them something. The Mote engineers made two widgets do one job, all right, but the second widget does two other jobs, and some of the supports are also bimetallic thermostats and thermoelectric generators all in one. Rod, I barely understand the words. Modules: human engineers work in modules, don't they?"
> “For a complicated job, of course they do."
> “The Moties don't. It's all one piece, everything working on everything else. Rod, there's a fair chance the Moties are brighter than we are."
- The Mote in God's Eye, Larry Niven and Jerry Pournelle (1974)
[…too bad that today's LLMs are not brighter than we are, at least when it comes to writing correct code…]
Given how prevalent furries seem to be, especially in nerd adjacent culture, I'd say he was ahead of his time.
Ringworld is pretty good, the multiples sequels get kind of out there.
I think a lot about Motie engineering versus human engineering. Could Motie engineering be practical? Is human engineering a fundamentally good idea, or is it just a reflection of our working memory of 7 +/- 2? Biology is Motie-esque, but it's pretty obvious we are nowhere near a technology level that could ever bring a biological system up from scratch.
If Motie engineering is a good idea, it's not a smooth gradient. The Motie-est code I've seen is also the worst. It is definitely not the case that getting a bit more Motie-esque, all else being equal, produces better results. Is there some crossover point where it gets better and maybe passes our modular designs? If AIs do get better than us at coding, and it turns out they do settle on Motie-esque coding, no human will ever be able to penetrate it ever again. We'd have to instruct our AI coders to deliberately cripple themselves to stay comprehensible, and that is... economically a tricky proposition.
After all, anyone can write anything into a novel they want to and make anything work. It's why I've generally stopped reading fiction that is explicitly meant to make ideological or political points to the exclusion of all else; anything can work on a page. Does Motie engineering correspond to anything that could be manifested practically in reality?
Will the AIs be better at modularization than any human? Will they actually manifest the Great OO Promise of vast piles of amazingly well-crafted, re-usable code once they mature? Or will the optimal solution turn out to be bespoke, locally-optimized versions of everything everywhere, and the solution to combining two systems is to do whatever locally-sensible customizations are called for?
(I speak of the final, mature version, however long that may be. Today LLMs are kind of the worst of both worlds. That turns out to be a big step up from "couldn't play in this space at all", so I'm not trying to fashionably slag on AIs here. I'm more saying that the one point we have is not yet enough to draw so much as a line through, let alone an entire multi-dimensional design methodology utility landscape.)
I didn't expect to live to see the answers, but maybe I will.
It's the kind of thing you commonly get if you let an unconstrained optimization process run for long enough. It will generally be better, according to whatever function you're optimizing for. The main disadvantage, apart from being hard to understand or modify the design, is manufacturing and repair (needing to make many different parts), but if you have sufficiently good manufacturing technology (e.g. atomic level printers), then that may be a non-issue. And in software it's already feasible: you can see very small scale versions of this in extremely resource-constrained environments where it's worthwhile really trying to optimize things (see some demoscene entries), but it's pretty rare (some tricks that optimizing compilers pull off are similar, but they are generally very local).
Would open source, local models keep pressure on AI companies to prioritize the usable code, as code quality and engineering time saved are critical to build vs buy discussions?
This has lead to their abilities stalling while their output seemingly goes up. But when you look at the quality of their output, and their ability to get projects over the last 10% or make adjustments to an already completed project without breaking things, it's pretty horrendous.
It certainly is hard when I'm say writing unit tests to avoid the temptation to throw it into Cursor and prompt until it works.
At work I've inherited a Kotlin project and I've never touched Kotlin or android before, though I'm an experienced programmer in other domains. ChatGPT has been guiding me through what needs to be done. The problem I'm having is that it's just too damn easy to follow its advice without checking. I might save a few minutes over reading the docs myself, but I don't get the context the docs would have given me.
I'm a 'Real Programmer' and I can tell that the code is logically sound and self-consistent. The code works and it's usually rewritten so much as to be distinctly my code and style. But still it's largely magical. If I'm doing things the less-correct way, I wouldn't really know because this whole process has led me to some pretty lazy thinking.
On the other hand, I very much do not care about this project. I'm very sure that it will be used just a few times and never see the light of day again. I don't expect to ever do android development again after this, either. I think lazy thinking and farming the involved thinking out to ChatGPT is acceptable here, but it's clear how easily this could become a very bad habit.
I am making a modest effort to understand what I'm doing. I'm also completely rewriting or ignoring the code the AI gives me, it's more of an API reference and example. I can definitely see how a less-seasoned programmer might get suckered into blindly accepting AI code and iterating prompts until the code works. It's pretty scary to think about how the coming generations of programmers are going to experience and conceptualize programming.
Like you, I’ve accumulated tons of LLM usage via apps and web apps. I can actually see how the models are much more succinct there compared to the API interface.
My uneducated guess is that LLM models try to fit their responses into the “output tokens” limit, which is surely much lower in UIs than what can be set in pay-as-you-go interfaces.
Both can generate code though, I've generated code using the web interface and it works, it's just a bit tedious to copy back and forth.
On the other hand, it shows how much coding is just repetition. You don't need to be a good coder to perform serviceable work, but you won't create anything new and amazing either, if you don't learn to think and reason - but that might for some purposes be fine. (Worrying for the ability of the general population however)
You could ask whether these students would have gotten anything done without generated code? Probably, it's just a momentarily easier alternative to actual understanding. They did however realise the problem and decided by themselves to write their own code in a simpler, more repetitive and "stupid" style, but one that they could reason about. So hopefully a good lesson and all well in the end!
I've found with LLMs I can usually convince them to get me at least something that mostly works, but each step compounds with excessive amounts of extra code, extraneous comments ("This loop goes through each..."), and redundant functions.
In the short term it feels good to achieve something 'quickly', but there's a lot of debt associated with running a random number generator on your codebase.
Good programs are written by people who anticipate what might go wrong. If the document says 'don't do X'; they know a tester is likely to try X because a user will eventually do it.
You remember those days right? All those Flash sites.
Anthropomorphizing LLMs is not helpful. It doesn't get anything, you just gave it new tokens, ones which are more closely correlated with the correct answer. It also generates responses similar to what a human would say in the same situation.
Note i first wrote "it also mimicks what a human would say", then I realized I am anthropomorphizing a statistical algorithm and had to correct myself. It's hard sometimes but language shapes how we think (which is ironically why LLMs are a thing at all) and using terms which better describe how it really works is important.
https://www.microsoft.com/en-us/worklab/why-using-a-polite-t...
It's a feature of language to describe things in those terms even if they aren't accurate.
>using terms which better describe how it really works is important
Sometimes, especially if you doing something where that matters, but abstracting those details away is also useful when trying to communicate clearly in other contexts.
Yes, Claude Code can be token-heavy, but that's often a trade-off for their current level of capability compared to other options. Additionally, Claude Code has built-in levers for cost (I prefer they continue to focus on advanced capability, let pricing accessibility catch up).
"early days" means:
- Prompt engineering is still very much a required skill for better code and lower pricing
- Same with still needing to be an engineer for the same reasons, and:
- Devs need to actively guide these agents. This includes detailed planning, progress tracking, and careful context management – which, as the author notes, is more involved than many realize. I've personally found success using Gemini to create structured plans for Claude Code to execute, which helps manage its verbosity and focus to "thoughtful" execution (as guided by gemini). I drop entire codebases into Gemini (for free).
Agree with you on all the rest, and I think writing a post like this was very much intended as a gut-check on things since the early days are hopefully the times when things can get fixed up.
The leaked Claude Code codebase was riddled with "concise", "do not add comments", "mimic codestyle", even an explicit "You should minimize output tokens as much as possible" etc. Btw, Claude Code uses a custom system prompt, not the leaked 24k claude.ai one.
1. Poor solutions.
2. Solutions not understood by the person who prompted them.
3. Development team being made dumber.
4. Legal and ethical concerns about laundering open source copyrights.
5. I'm suspicious of the name "vibe coding", like someone is intentionally marketing it to people who don't care to be good at their jobs.
6. I only want to hire people who can do holistically better work than current "AI". (Not churn code for a growth startup's Potemkin Village, nor to only nominally satisfy a client's requirements while shipping them piles of counterproductive garbage.)
7. Publicizing that you are a no-AI-slop company might scare away the majority of the bad prospective employees, while disproportionately attracting the especially good ones. (Not that everyone who uses "AI" is bad, but they've put themselves in the bucket with all the people who are bad, and that's a vastly better filter for the art of hiring than whether someone has spent months memorizing LeetCode answers solely for interviews.)
dude, you can use Gemini Pro 2.5 with Cline - it's free and is rated at least as good as Claude Sonnet 3.7 right now.
This is an obvious mistake, the price is per Megatoken, not per token.
Source: https://www.anthropic.com/pricing
No comments yet
The author should try Gemini it’s much better.
Just to illustrate, I asked both about a browser automation script this morning. Claude used Selenium. Gemini used Playwright.
I think the main reasons Gemini is much better are:
1. It gets my whole code base as context. Claude can't take that many tokens. I also include documentation for newer versions of libraries (e.g. Svelte 5) that the LLM is not so familiar with.
2. Gemini has a more recent knowledge cutoff.
3. Gemini 2.5 Pro is a thinking model.
4. It's free to use through the web UI.
What’s the difference?
Even in this article though, I feel like there is a lot of anthropomorphization of LLMs.
> LLMs and their limitations when reasoning about abstract logic problems
As I understand them, LLMs don't "reason" about anything. It's purely a statistical sequencing of words (or other tokens) as determined by the training set and the prompt. Please correct me if I'm wrong.
Also, regarding this theory that the models may be biased to produce bloated code: I've reposted this once already, and no one has replied yet, and I still wonder:
----------
To me, this represents one of the most serious issues with LLM tools: the opacity of the model itself. The code (if provided) can be audited for issues, but the model, even if examined, is an opaque statistical amalgamation of everything it was trained on.
There is no way (that I've read of) for identifying biases, or intentional manipulations of the model that would cause the tool to yield certain intended results.
There are examples of DeepState generating results that refuse to acknowledge Tienanmen square, etc. These serve as examples of how the generated output can intentionally be biased, without the ability to readily predict this general class of bias by analyzing the model data.
----------
I'm still looking for confirmation or denial on both of these questions...
Is it really vibe coding if you are building a detailed coding plan, conducting "git-based experimentation with ruthless pruning", and essentially reviewing the code incrementally for correctness and conciseness? Sure, it's a process dependent on AI, but it's very far from nearly "forget[ing] that the code even exists".
That all said, I do think the article captures some of the current cost/quality dilemmas. I wouldn't jump to conclusions that these incentives are actually driving most current training decisions, but it's an interesting area to highlight.
In my own usage, I tend to alternate between tiny, well-defined tasks and larger-scale, planned architectural changes or new features. Things in between those levels are hit and miss.
It also depends on what I'm building and why. If it's a quick-and-dirty script for my own use, I'll often write up - or speak - a prompt and let it do its thing in the background while I work on other things. I care much less about code quality in those instances.
[1] https://trends.google.com/trends/explore?geo=US&q=%22vibe%20...
Half of my job is fighting the "copy/paste/change one thing" garbage that developers generate. Keeping code DRY. The autocompletes do an amazing job of automating the repeated boilerplate. "Oh you're doing this little snippet for the first and second property? Obviously you want to do that for every property! Let me just expand that out for you!"
And I'm like "oooh, that's nice and convenient".
...
But I also should be looking at that with the stink-eye... part of that code is now duplicated a dozen times. Is there any way to reduce that duplication to the bare minimum? At least so it's only one duplicated declaration or call and all of the rest is per-thingy?
Or any way to directly/automatically wrap the thing without going property-by-property?
Normally I'd be asking myself these questions by the 3rd line. But this just made a dozen of those in an instant. And it's so tempting and addictive to just say "this is fine" and move on.
That kind of code is not fine.
Depends on your definition of fine. Is it less readable because it's doing the straight forward thing several times instead of wrapping it into a loop or a method, or is it more readable because of that.
Is it not fine because it's slower, or does it all just compile down to the same thing anyway?
Or is it not fine because you actually should be doing different things for the different properties but assumed you don't because you let the AI do the thinking for you?
I agree, but I'm also challenging that position within myself.
Why isn't it OK? If your primary concern is readability, then perhaps LLMs can better understand generated code relative to clean, human-readable code. Also, if you're not directly interacting with it, who cares?
As for duplication introducing inconsistencies, that's another issue entirely :)