> Because no one would write an HTTP fetching implementation covering all edge cases when we have a data fetching library in the project that already does that.
> No one would implement a bunch of utility functions that we already have in a different module.
> No one would change a global configuration when there’s a mechanism to do it on a module level.
> No one would write a class when we’re using a functional approach everywhere.
Boy I'd like to work on whatever teams this guy's worked on. People absolutely do all those things.
elashri · 11h ago
To be honest, most of these things can happen for poorly documented large codebase. I work on academic research project that have docs that tells you basically that the code is self documented. And give one or two pages about configuring CMake and build the project and another page on how to benchmark the throughout. But the internal quirks and the expected convention you will need to figure it on your own.
New people contributing usually reinvent many things and change global configuration because they don't know they can use something already there.
Ironically indexing the codebase and ask LLM questions about specific things is the best thing you can do. Because the only 3 people who you can ask left the project or are busy and will reply within a week.
godelski · 10h ago
> docs that tells you basically that the code is self documented
Anytime someone tells me the code is self-documented I hear "there's no documentation."
The most common programmer's footgun
I don't have time to document
| ^
v |
Spends lots of time trying to understand code
We constantly say we don't have time to document the code. So instead we spend all our time reading code and trying to figure out what it does, to the minimal amount of understanding needed to implement whatever thing we need to implement.
This, of course, itself is naive because you can't know what the minimal necessary information is without knowing something about the whole codebase. Which is also why institutional knowledge is so important and why it is also weird that we'd rather have pay raises through switching companies than through internal raises. That's like trying to fix the damage from the footgun with a footgun.
fsloth · 6h ago
In some cases like clean written geometry algorithms the code _is_ the best technical documentation and attempts at verbal description would sound awkward and plausible become dated. In this case the purpose of the written docs is to offer enough context (possibly quite a lot) to understand the _why_ but the how is easiest to understand by reading the code.
I’m not arguing about your personal experience but these things are not absolutes.
The key thing is can a new developer jump in and understand the thing. Add enough docs until they facilitate this understanding as well as possible. Then stop documenting and point the reader to the code.
godelski · 1h ago
I disagree (a bit). It really depends on the person. I've know a good number of scientists who are great mathematicians but get confused as soon as they see code.
My point is that everyone is different. Documentation isn't just for developers and you never know who's going to contribute. It is also beneficial to have multiple formats just because even with a single person different ways make more sense on one day than the next. Having different vantage points is good to have. It is also good to practice your own mental flexibility[0]
I think the pytorch docs are a good example here. Check out the linalg module[1] (maybe skip the matrix properties section).
[0] This will also help you in the workplace to better communicate with others as well as makes you a better problem solver. Helps you better generalize ideas.
For me it is difficult to give good code comments just when code is written. The problem is solved, the tricky parts if any are internalized. I dont mind reading code so just documenting what the code is doing does seldom bring value. The important thing is to document why the code does things in an non obvious way and unintuitive scenarios and edge cases etc.
When revisiting code is the best time to add comments because then you will find out what is tricky and what is obvious.
Code reviews are also good for adding code comments. If the people reviewing are doing their job and are actually trying to understand the code then it is a good time to get feedback where to add comments.
godelski · 31m ago
Here's my method, which is a bit similar to the siblings.
Your first "docs" are your initial sketch. The pieces of paper, whiteboard, or whatever you used to formulate your design. I then usually write code "3" times. The first is the hack time. If in a scripting language like python, test your functions in the interpreter, isolated. Then "write" 2 is bringing into the code, and it is a good idea to add comments here. You'll usually catch some small things here. Write the docstrings now, which is your 2nd docs and your first "official" ones. While writing those I usually realize some ways I can make my code better. If in a rush, I write these down inside the docstring with a "TODO". When not rushing I'll do my 3rd "write" and make those improvements (realistically this is usually doing some and leaving TODOs).
This isn't full documentation, but at least what I'd call "developer docs". The reason I do things this way is that it helps me stay in the flow state, but allows me to move relatively fast while minimizing tech debt. It is always best to write docs while everything is fresh in your mind. What's obvious today isn't always obvious tomorrow. Hell, it isn't always obvious after lunch! This method also helps remind me to keep my code flexible and containerize functions.
Then code reviews help you see other viewpoints and things you possibly missed. You can build a culture here where during review TODOs and other similar things can be added to internal docs so even if triaged the knowledge isn't completely lost.
Method isn't immutable though. You have to adapt to the situation at the time, but I think this is a good guideline. It probably sounds more cumbersome than it is, but I promise that second and third write are very cheap[0]. It just sounds like a lot because I'm mentioning every step[1]
[0] Even though I use vim, you can run code that's in the working file, like cells. So "write 2" kinda disappears, but you still have to do the cleanup here so that's "write 2"
[1] Flossing your teeth also sounds like a lot of work if you break it down into all subtasks 1) find floss, 2) reach for floss, 3) open floss container, ...
jononor · 8h ago
There is a hack for that: write the comments before and/or as you write the code. When things are still unclear, weird.
Of course do a final pass on them to ensure that they are correct and useful in the end.
This is one example of document as-you-go, instead of doing it after "the work" is done. I find it generally leads to better outcomes. Doing documenting only at the end is in many ways the worst way to do it.
avhception · 8h ago
Sometimes I feel like watching people dig a hole until it starts filling with groundwater. They then start bailing the water out with buckets. They're very busy doing that, so the actual digging work slowly grinds to almost zero.
I stand at the edge of the pit, trying to talk to them about electrical pumps and drainage solutions and get yelled at: "I don't have time for your nonsense, can't you see I'm busy bailing water here!?"
godelski · 14m ago
It's a good visual story to describe something we all frequently do. We often create our own anxiety.
It's easy to hear "let's slow down a little" as "don't move fast" but it's wrong to interpret that because "slow down" is relative. There is such a thing as "too fast". You want to hear "slow down" just as much as you want you great calls to speed up. When you hear both you should be riding that line of fast but not too fast. It's also good to make sure you have a clear direction. No use in getting nowhere faster.
I'll use another visual analogy. Let's say you and I have a race around the world. I take off running, you move to the drawing board. I laugh as I'm miles ahead, and you go to your workshop, which is in the other direction. The news picks up our little race, and laughs at you as I have such a tremendous lead. I'm half way done, but you come out of your workshop having build a jet. I only get a few more miles before you win the race. The news then laughs at my stupidity and your cleverness, as if it was so obvious all along.
Sometimes to move fast you need to slow down. It takes lots of planning and strategizing to move at extraordinary speeds.
aspenmayer · 9h ago
> why it is also weird that we'd rather have pay raises through switching companies than through internal raises
How does the saying go, something like “show me the incentives and I’ll show you the outcome?”
> That's like trying to fix the damage from the footgun with a footgun.
If you value your money/time/etc, wouldn't the best way to fix the damage from footguns be by preventing the damage to you in the first place by not being there if/when it goes off?
I think your point is well put, I’m just trying to follow your reasoning to a conclusion logical to me, though I don't know if mine is the most helpful framing. I didn’t pick the footgun metaphor, but it is a somewhat useful model here for explaining why people may act the way they do.
ffsm8 · 9h ago
The other thing about documentation is that it inevitably goes stale.
So the question becomes: is no documentation better or documentation that can be - potentially - entirely out of date, misleading or subtly wrong, because eg they documented the desired behavior vs actual behavior (or vice versa).
I'm generally pro documentation, I'm just fully aware that internal documentation the devs need to write themselves and for themselves... Very rarely gets treated with enough respect to be trustworthy.
So what it comes down to is one person spearheading the efforts for docs while the rest of the team constantly "forgets" it, until they decide it's not worth the effort as soon as the driving force either changes teams or gave up themself.
MoreQARespect · 8h ago
>The other thing about documentation is that it inevitably goes stale.
Not if you generate reference docs from code and how-to docs from tests.
aspenmayer · 8h ago
I think there’s a natural tendency to want to document the process for those who have the work assigned to them, and some folks will self-assign it because they see the value.
Knowledge transfer through technical writing doesn’t always manifest itself if it isn’t part of the work process at the time you have that in your mental context. It’s hard to have that context to write the docs if you’re context switching from working on something else or not involved at that level, so it’s hard to just drop in to add docs if there isn’t some framework for writing ad hoc docs for someone to fix up later.
I don’t have experience at traditional employers though so I can’t speak authoritatively here. I’m used to contracts and individual folks and small business. Having human readable documents is important to me because I’m used to folks having things explained on their level, which requires teaching only what they need and want to know to get their work done. Some folks don’t even know what they need when they ask me for help, so that’s its own process of discovery and of documentation. I’m used to having to go to them where they are and where the issue is, so there was no typical day at the office or out of it. Whatever couldn’t fit through the door, I had to go to myself. I’ve had to preserve evidence of potential criminal wrongdoing and document our process. It taught me to keep notes and to write as I work.
I think most places do have some kind of process for doing this, and I suspect the friction in doing the thing is part of the issue, and the fact that it’s difficult thankless work that doesn’t show up on most tracked metrics is part of the issue.
If docs were mandated they would get done. If someone’s job was to make sure they were done well, that would help. I guess folks could step up and try to make that happen and that might be what it takes to make that happen.
nosianu · 11h ago
> most of these things can happen for poorly documented large codebase.
Documentation does not help beyond a point. Nobody reads the documentation repeatedly, which would be needed.
When you keep working on a project, and you need a new function, you would need to check or remember every single time that such a function already exists or might exist somewhere. You may have found it when you read the docs months ago, but since you had no need for that function at the time your brain just dismissed it and tossed that knowledge out.
For example, I had a well-documented utils/ folder with just a few useful modules, but they kept getting reimplemented by various programmers. I did not fault them, they would have had to remember every single time they needed some utility to first check that folder. All while keeping up that diligence forever, and while working on a number of projects. It is just too hard. Most of the time you would not find what you need, so most of the time that extra check would be a waste. Even the most diligent person would at some point reimplement something that already exists, no matter how well-documented it is. It's about that extra search step itself.
The closer you want 100% perfection you get exponentially increasing effort. So we have some duplication, not a big deal. Overall architectural quality is more important than squeezing out those last not really important few percent of perfection.
globnomulous · 1h ago
In my experience, the usefulness of documentation in code declines as familiarity with a codebase increases. The result: people ignore it; it becomes outdated; now it's debt. Similarly, non-intralinear documentation (documentation that isn't in the code) tends to grow with a codebase. Meanwhile, the codebase changes, personnel change, and more and more of the documentation beco.ed noise, a historical artifact of solving problems that either no longer exist or can no longer be solved the same way.
That being said, good documentation is worth its weight in gold and supports the overall health and quality of a codebase/project. Open-source projects that succeed often seem to have unusually strong, disciplined documentation practices. Maybe that's just a by-product of engineering discipline, but I don't think it is -- at least not entirely.
JoRyGu · 10h ago
I'm sorry, but this is selling good engineers very short. If you didn't nest your utils folder 8 folders deep, it seems pretty obvious that one should check the utils folder before writing another utility function. This stuff should also be caught in code reviews. Maybe the new guy didn't know that util function existed, but surely you did when you reviewed their MR? Obviously mistakes like that can happen, but I've found that to be the exception rather than the rule, even in some of the gnarlier codebases I've worked in.
staunton · 10h ago
> should also be caught in code reviews
Assuming they even have code reviews - in your experience, in a situation where the person writing the code didn't check if it already exists, the reviewer will check that and then tell them to delete their already finished implementation and use that existing thing?
wheybags · 10h ago
I wouldn't say you should explicitly check, necessarily. More like, you go to implement the widget and when you open the appropriate file to get started, it's already there.
michaelsalim · 10h ago
I for one think that this discipline is what separates a good developer from being a good engineer. This kind of rigorous process is the kind of thing that I'd expect from most devs but is sadly missing most of the time.
yunohn · 10h ago
I agree with you completely, but also posit that this is exactly what agentic LLMs should solve?
Claude code’s Plan mode kind of does this research before coding - but tbf the Search tool seemingly fails half the time with 0 results and it gets confused and then reimplements too…
perrygeo · 4h ago
I've seen developers add a second ORM library as a dependency, not because the first didn't do the job but because they just "forgot" about the first one and wanted to use the new hotness. Developers, just like LLMs, have biases that taint the solution space.
The key is that we all have an intuitive sense that this behavior is wrong - building a project means working within the established patterns of that project, or at least being aware of them! Going off half-cocked and building a solution without considering the context is extremely bad form.
In the case of human developers, this can be fixed on the code review level, encouraging a culture of reading not just writing code. Without proper guardrails, they can create code that's dissonant with the existing project.
In the case of LLMs, the only recourse is context engineering. You need to make everything explicit. You need to teach the LLM all the patterns that matter. Their responses will always be probabilistic token salad, by definition. Without proper guardrails, it will create code that's dissonant with the existing project.
Either way, it's a question of subjective values. The patterns that are important need to be articulated, otherwise you get token salad randomly sampling the solution space.
eddd-ddde · 4h ago
Heavy agree on your first paragraph. You spend one evening removing one unnecessary dependency only to see 5 more were added by next Monday.
I think soon enough we'll have a decent LLM that's capable of reviewing ALL changes to ensure they follow the "culture" we expect to see.
0x_rs · 12h ago
This. Reinventing the wheel at every opportunity, forgetting about or ignoring the expected way to do something, mixing patterns, you name it. The author may call it "vibe coding", that's fine but it has little to do with LLMs. The tool has the same amount of care anyone rushing to get something done, or that hasn't build the project themselves, or maybe doesn't have enough experience would. I can only assume it's a not-very-subtle complaint about a specific person in their team, "written in a way no developer on the team would" is telling.
I'd be extremely careful about applying this thinking anywhere else. There's enough baseless finger-pointing in academia and arts already.
noodletheworld · 5h ago
> The author may call it "vibe coding", that's fine but it has little to do with LLMs.
Humm.
Maybe if we say that this is not an issue from vibe coding it wont be?
Maybe if we pretend that maybe a naive junior would make these mistakes (true) we should be happy to accept them from senior developers (false)?
LLMs are extraordinarily bad at doing these things.
I’ve seen it.
You've seen it.
The OP has seen it.
You’re in a rush so you wrote some classes in a code base in a language which supports classes but has no classes in it?
Really? Did that get past code review before? Did you deliberately put up a code review that you knew would be rejected and take longer to merge as a result because you were in a hurry?
Of course not.
You did the bare minimum that still met the basic quality standards expected of you.
I get it. We all get it. When youre in a rush you cut corners to move faster.
…but thats not what the OP is talking about, and its not what I see either:
Its people putting up AI slop and not caring at all what the content was.
Just a quick check it compiled and the tests pass if youre lucky.
Too lazy even put a “dont use classes” in their cursor rules file.
Come on. The OP isnt saying dont use AI.
Theyre saying care, just a little bit about your craft ffs.
godelski · 10h ago
I think you're missing the author's thesis
> Is speed the greatest virtue?
If speed is the greatest virtue then yeah, all that stuff will happen. But if it isn't, then that stuff will happen at a much lower frequency. Because, all the stuff mentioned is just tech debt. Debt doesn't go away, it accrues interest.
If speed is all that matters then you need exponential output, as your output needs to offset the debt. If speed is a factor but isn't the only factor, then you need to weigh it against the other things. Take on debt wisely and pay it off when you can. But it does seem like there's a trend to just take on as much debt and hope for the best. Last I checked, most people aren't really good at handling debt.
lbriner · 9h ago
I think you are misusing the phrase "tech debt" like many people do.
Not everything that is not perfect is Tech Debt, some of it is just pragmatism. If you end up with two methods doing the same thing, who cares? As long as they are both correct, they cost nothing, might never need any maintenance attention and will never be paid down before the codebase is replaced in 10 years time.
Same with people writing code in a different style to others. If it is unreadable, that isn't tech debt either, it's just a lack of process or lack of someone following the process. Shouldn't be merged = no tech debt.
Adding some code to check edge cases that are already handled elsewhere. Again, who cares? If the code make it unreadable, delete it if you know it isn't needed, it only took 10 seconds to generate. If it stays in place and is understandable, it's not tech debt. Again, not going to pay it down, it doesn't cost anything and worse case is you change one validation and not the other and a test fails, shouldn't take long to find the problem.
Tech debt is specifically borrowing against the right way to do something in order to speed up delivery but knowing that either the code will need updating later to cope with future requirements or that it is definitely not done in a reliable/performant/safe way and almost certainly will need visiting again.
godelski · 2h ago
I'm a bit confused because you start by disagreeing with me but then end up agreeing with me.
> If you end up with two methods doing the same thing, who cares? As long as they are both correct, they cost nothing
To be clear, tech debt isn't "code that doesn't run". It's, like you later say "borrowing against the right way to do something in order to speed up delivery", which is what I said the authors thesis was.
No need for perfection. Perfection doesn't exist in code. The environment is constantly moving, so all code needs to eventually be maintained.
But I also want to be very very clear here. Just because two functions have the same output doesn't mean that they're the same and no one should care. I'll reference Knuth's premature optimization here. You grab a profiler and find the bottleneck in the code and it's written with a function that's O(n^3) but can be written in O(n log n). Who cares? The customer cares. Or maybe your manager who's budgeting that AWS bill does. You're right that they're both logically "correct" but it's not what you want in your code.
Similarly, code that is held together with spaghetti and duct tape is tech debt. It runs. It gives the correct output. But it is brittle, hard to figure out what it does (in context), and will likely rot. "There's nothing more permanent than a temporary fix that works ", as the saying goes. I guess I'll also include the saying "why is there never time to do things right but there's always time to do things twice?"
Code can be broken in many ways. Both of those situations have real costs. Costs in terms of both time and money. It's naïve to think that the only way code can be broken is by not passing tests. It's naïve to think you've tested everything that needs to be tested. Idk about you, but when I code I learn more about the problem, often with the design changing. Most people I know code this way. Which is why it is always good to write flexible code, because the only thing you can rely on with high confidence is that it's going to change
darkwater · 8h ago
> Tech debt is specifically borrowing against the right way to do something in order to speed up delivery but knowing that either the code will need updating later to cope with future requirements or that it is definitely not done in a reliable/performant/safe way and almost certainly will need visiting again.
Thing that many people do without even realizing they are incurring in tech debt. This kind of developers are the one that will just generate more tech debt with an LLM in their hands (at least now).
whilenot-dev · 5h ago
Code that demands to pay off its dept isn't non-working code, it's rather code that exceeds one's ability to maintain it properly (your mention of "unreadability" included). Whether a PR introduces debt isn't always known and often times has to be discovered later on, depending on how fast its maintainers fluctuate and the ecosystem advances.
That said, tech debt isn't paid by developers individually, it's paid by organizations in developers time. Only in rare cases can you make a deliberate decision for it, as it grows organically within any project. For example, most python2 code today that used niche libraries with outdated docs that have been taken offline in the meantime has to be considered expensive tech debt nowadays.
Msurrow · 12h ago
I have worked on such teams. Mostly, even. I would not accept any PRs with code doing any of those things (human or machine made). Small(er) teams on small to medium sized projects.
Critical solutions, but small(er) projects with 2-4 devs, that’s where it’s at. I feel like it’s because then it’s actually possible to build a devteam culture and consensus that has the wanted balance of quality and deliveryspeed.
hakunin · 5h ago
My entire (decades) career I worked primarily in small start up teams, and even when people didn't see eye to eye, they always maintained these kinds of basic practices. I think a lot of disagreement on these expectations is rooted in the size and "tight-knit"ness of your team.
camdenreslink · 2h ago
I think size of the code base matters a lot too. In a multi-100k line of code application, you might not know a utility function is already defined somewhere and reinvent the wheel.
bux93 · 5h ago
Not only do people do these things, they put their work on github. Which is where the LLM learns to do it!
shepherdjerred · 7h ago
You can prevent quite a lot of these issues if you write rules for Cursor or your preferred IDE
Linters can also help quite a bit. In the end, you either have your rules enforced programmatically or by a human in review.
I think it’s a very different (and so far, for me, uncomfortable) way of working, but I think there can be benefits especially as tooling improves
sshine · 5h ago
It seems like people who use AI for coding need to reinvent a lot of the same basic principles of software engineering before they gradually propagate into the mainstream agentic frameworks.
Coding agents come with a lot of good behavior built in.
Like "planning mode" where they create a strong picture of what's to be made before touching files. This has honestly improved my workflow at programming from wanting to jump into prototyping before I even have a clear idea, to being very spec-oriented: Of course there needs to be a plan, especially when it will be drafted for me in seconds.
But the amount of preventable dumb things coding agents will do that need to be explicitly stated and meticulously repeated in their contexts reveals how simply training on the world's knowledge does not capture senior software engineer workflows entirely, and captures a lot of human averageness that is frowned upon.
cardanome · 4h ago
Do those rules really work? I have added the rule to not not add comments and I still have to constantly remind the model to not add comments despite of it.
ewoodrich · 3h ago
I have a .roorules file with only about four instructions, one of which is an (unintentional) binary canary of very simple rule following at the end of a task. And another rule that’s a fuzzier canary as it is not always applicable but usually occurs a few times in a task so helps me confirm the rules are being parsed at all in case Roo has a bug.
All the models I’ve used (yes, including all the biggest, newest, smartest ones) follow the binary rule about 75% of the time at the very most. Usually closer to 50% on average, with odds significantly decreasing the longer the context increases as it occurs at the end of a task but other than that seems to have no predictable pattern.
The fuzzier rule is slightly better, I’m guessing because it applies earlier in the context window, at around 80% compliance and uses lots of caps and emphasis. This one has a more predictable failure mode of the ratio of reading code vs thinking/troubleshooting/time the model is “in its own head”. When mostly reading code or my instructions compliance is very high, when doing extended troubleshooting or anything that starts to veer away from the project itself into training data it is much lower.
So it’s hit and miss and does help but definitely not something I’d rely on as a hard guardrail, like not executing commands, which Roo has a non-LLM tool config to control. So over time I hope agentic runners add more detetministic config outside the model itself, because instructions still aren't as reliable as they should be and don't seem to be getting substantially better in real use.
mysterydip · 7h ago
I once inherited a project that had three separate classes for storing time, each with their own methods to convert between each other.
hakunin · 5h ago
I can think of reasons for this (e.g. storing date/time as it was originally represented 100, 1000 years ago in a historical context vs storing live time for present-day time calculations vs storing non-timezone time for, say, operating hours relative to subject's location), so this statement alone doesn't show fault.
Cthulhu_ · 11h ago
They do, but if they're on the team for a while they already should know how to find / use the established patterns; if they then submit something that they normally wouldn't, it's a problem.
lintfordpickle · 11h ago
>> No one would implement a bunch of utility functions that we already have in a different module.
to be fair on this one, and while I don't flat out disagree, lots of people reinvent utility functions simply because they don't know they exist elsewhere, especially on huge code bases. This seems to get mostly rectified within the PRs, when a senior dev comments on it - the problem then is, you've only increased the number of people who now know by 1.
Too · 5h ago
Identifying such almost-duplicate code is an open opportunity for LLMs.
merelysounds · 11h ago
I guess these things occur in workplaces that prioritize speed over maintainability anyway.
shaky-carrousel · 11h ago
Competent people don't. And if your code looks competent but your behaviour doesn't, it's a clue.
buserror · 12h ago
I personally treat the LLM as a very junior programmer. He's willing to work, will take instructions, but his knowledge of the codebase, and patterns we use is lacking strongly. So it needs a LOT of handholding, very clear instructions, description of potential pitfalls, and smaller, scoped tasks, and reviewed carefully to catch any straying off pattern.
Also, I make it work the same way I do: I first come up with the data model until it "works" in my head, before writing any "code" to deal with it. Again, clear instructions.
Oh another thing, one of my "golden rule" is that it needs to keep a block comment at the top of the file to describe what's going on in that file. It acts as a second "prompt" when I restart a session.
It works pretty well, it doesn't appear as "magic" as the "make it so!" approach people think they can get away with, but it works for me.
But yes, I still also spend maybe 30% of the time cleaning up, renaming stuff and do more general rework of the code before it comes "presentable" but it still allows to work pretty quickly, a lot quicker than if I were to do it all by hand.
otikik · 8h ago
I think "junior programmer" (or "copilot") oversells the AI in some cases and undersells it in others. It does forget things that a regular person wouldn't, and it does very basic coding mistakes sometimes. At the same time it's better than me at some things (getting off-by-one errors when dealing with algorithms that work on arrays). It also has encyclopedic knowledge about basically anything out there on the internet. Red-black Trees? Sure thing. ECS systems for game programming? No problemo, here are the most used libraries.
I have ended up thinking about it as a "hunting dog". It can do some things better than me. It can get into tiny crevasses and bushes. It doesn't mind getting wet or dirty. It will smell the prey better than me.
But I should make the kill. And I should be leading the hunt, not the other way around.
jeanloolz · 7h ago
That hunting dog analogy is epic and perfectly matches my experience.
pulse7 · 11h ago
The difference between LLM and a very junior programmer: junior programmer will learn and change, LLM won't change! The more instructions you put in the prompt, the more will be forgotten and the more it will bounce back to the "general world-wide average". And on next prompt you must start all over again... Not so with junior programmers ...
irb · 9h ago
This is the only thing that makes junior programmers worthwhile. Any task will take longer and probably be more work for me if I give it to a junior programmer vs just doing it myself. The reason I give tasks to junior programmers is so that they eventually become less junior, and can actually be useful.
Having a junior programmer assistant who never gets better sounds like hell.
presentation · 6h ago
The tech might get better eventually, it has gotten better rapidly to this point and everyone working on the models are aware of these problems. Strong incentive to figure something out.
Or maybe this is it. Who knows.
buserror · 11h ago
Ahaha you likely haven't seen as many Junior Programmer as I have then! </jk>
But I agree completely some juniors are a pleasure to see bloom, it's nice when one day you see their eye shine and "wow this is so cool, never realized you made that like THAT for THAT reason" :-)
n4r9 · 10h ago
The other big difference is that you can spin up an LLM instantly. You can scale up your use of LLMs far more quickly and conveniently than you can hire junior devs. What used to be an occasional annoyance risks becoming a widespread rot.
muzani · 11h ago
They're automations. You have to program them like every other script.
freilanzer · 8h ago
The learning is in the model versions.
relistan · 9h ago
My guess is that you're letting the context get polluted with all the stuff it's reading in your repo. Try using subagents to keep the top level context clean. It only starts to forget rules (mostly) when the context is too full of other stuff and the amount taken up by the rules is small.
animal531 · 9h ago
Definitely. To be honest I don't think LLM's are any different from googling and copying code off the Internet. Its still up to the developer to take the code, go over it, make sure its doing what its supposed to be doing (and only that) etc.
As for the last part, I've recently been getting close to 50 and my eyes aren't what they used to be. In order to fight off eye-strain I now have to tightly ration whatever I do into 20 minute blocks, before having to take appropriate breaks etc.
As a result of that time has become one of the biggest factors for me. An LLM can output code 1000x faster than a human, so if I can wrangle it somehow to do whatever basics for me then its a huge bonus. At the moment I'm busy generating appropriate struct of arrays for SIMD from input AoS structs, and I'm using Unity C# with LINQ to output the text (I need it to be editable by anyone, so I didn't want to go down the Roslyn or T4 route).
The queries are relatively simple, take the list of data elements and select the correct entries, then take whatever fields and construct strings with them. Even so, copying/editing them takes a lot longer than me telling GPT to select this, exclude that and make the string look like ABC.
I think there was a post yesterday about AI's as HUDs, and that makes a lot of sense to me. We don't need an all-powerful model that can write the whole program, what we need is a super-powered assistant that can write and refactor on a very small and local scale.
uyzstvqs · 9h ago
I personally see the LLM as a (considerably better) alternative to StackOverflow. I ask it questions, and it immediately has answers for my exact questions. Most often I then write my own code based on the answer. Sometimes I have the LLM generate functions that I can use in my code, but I always make sure to fully understand how it works before copy-pasting it into my codebase.
But sometimes I wonder if pushing a +400.000 lines PR to an open-source project in a programming language that I don't understand is more beneficial to my career than being honest and quality-driven. In the same way that YoE takes precedence over actual skill in hiring at most companies.
presentation · 6h ago
Unlike stack overflow, if it doesn’t know the answer it’ll just confidently spit out some nonsense and you might fall for it or waste a lot of time figuring out that it’s clueless.
You might get the same in Stack Overflow too, but more likely I’ve found either no response or, or someone pretty competent actually does come out of the woodworks.
presentation · 6h ago
I find success basically limiting it to the literal coding but not the thinking - chop tasks down to specific, constrained changes; write detailed specs including what files should be changed, how I want it to write the code, specific examples of other places to emulate, and so on. Doesn’t have to be insanely granular but the more breadcrumbs the higher chance it’ll work, you find a balance. And whatever it produces, I git add -p one by one to make sure each chunk makes sense.
More work up front and some work after, but still saves time and brain power vs doing it all myself or letting it vibe out some garbage.
neurostimulant · 8h ago
> Also, I make it work the same way I do: I first come up with the data model until it "works" in my head, before writing any "code" to deal with it. Again, clear instructions.
But then it's not vibe coding anymore :)
presentation · 6h ago
Is that a bad thing? What do we call this?
_def · 12h ago
I agree. Brings me to the question though, how to deal with team members that are less experienced and use LLMs. Code review needs much more work then to teach these principles. And most of the time people won't bother to do that and just rubber stamp the working solution.
buserror · 11h ago
In my experience, this is a problem without LLM anyway; many times you cannot just tell coworkers (junior, or not) to completely trash their patch and do it again (even using nicer words).
Very often it comes down to HR issues in the end, so you end up having to take that code anyway, and either sneakily revert it or secretly rework it...
x3n0ph3n3 · 3h ago
I have never seen a junior engineer make up API calls or arguments.
relistan · 11h ago
To a certain extent you are probably still not using it optimally if you are still doing that much work to clean it up. We, for example, asked the LLM to analyze the codebase for the common patterns we use and to write a document for AI agents to do better work on the codebase. I edited it and had it take a couple of passes. We then provide that doc as part of the requirements we feed to it. That made a big difference. We wrote specific instructions on how to structure tests, where to find common utilities, etc. We wrote pre-commit hooks to help double check its work. Every time we see something it’s doing that it shouldn’t, it goes in the instructions. Now it mostly does 85-90% quality work. Yes it requires human review and some small changes. Not sure how the thing works that it built? Before reviewing the code, have it draw a Mermaid sequence diagram.
We found it mostly starts to abandon instructions when the context gets too polluted. Subagents really help address that by not loading the top context with the content of all your files.
Another tip: give it feedback as PR comments and have it read them with the gh CLI. This is faster than hand editing the code yourself a lot of times. While it cleans up its own work you can be doing something else.
buserror · 11h ago
Interesting, I actually do have a coding-guidelines.md file for that purpose, but I hadn't thought of having the LLM either generate it, or maintain it; good idea! :-)
relistan · 9h ago
I actually had Claude and Gemini both do it and revise each other's work to get to the final doc. Worked surprisingly well.
palata · 12h ago
A risk with vibe coding is that it may make a good developer slightly faster, but it will make bad developers waaaay faster. Resulting in more bad code being produced.
The question then is: do the bad developers improve by vibe coding, or are they stuck in a local optimum?
woolion · 10h ago
So, I was wondering when I would see that... from my experience, I would say it also makes mediocre developers bad ones very fast. The reason being a false sense of confidence, but mostly it's because of the sheer volume that is produced.
If we want to be more precise, I think the main issue is that the AI-generated code lacks a clear architecture. It has no (or very little) respect for overall information flow, and single-responsibility principle.
Since the AI wants you to have "safe" code, so it will catch things and return non-results instead. In practice, that means the calling code has to inspect the result to see if it's a placeholder or not, instead of being confident because you'd get an exception otherwise.
Similarly, to avoid problems the AI might tweak some parameter. If for example you were to design an program to process something with AI, you might to gather_parameters -> call -> process_results. Call should not try to do funky things with parameters because that should be fixed at the gathering step. But locally the AI is always going to suggest having a bunch of "if this parameter is not good, swap it silently so that it can go through anyway".
Then tests are such a problem it would require an even longer explanation...
palata · 8h ago
To echo the article, I don't want to know it was written with an AI. Just like I don't want to see that it was obviously copy-pasted from StackOverflow.
The developer can do whatever they want, but at the end, what I review is their code. If that code is bad, it is the developer's responsibility. No amount of "the agent did it" matters to me. If the code written by the agent requires heavy refactoring, then the developer has to do it, period.
woolion · 6h ago
100% agree.
However, you'll probably get an angry answer that it's management fault, or something of the sort, that is to blame (because there isn't enough time). Responsibility would have to be taken up before in pushing back if some objectives truly are not reasonable.
oc1 · 7h ago
A whole new generation will discover the term net-negative programmer again ;)
bubblyworld · 9h ago
Personally I think caring is the resource in extremely short supply here and I don't think vibe coding has much to do with it causally. AIs are just tools - basically all of the issues the author has raised are present with human juniors too, and can be resolved quite easily with a little more guidance/interaction in both cases. I don't think AIs are universally causing people to care less about quality output, although that may be true for some people who didn't care much in the first place.
The common counter-argument here is that you miss out on training juniors, which is true, but it's not always an option (we are really struggling to hire at my startup, for instance, so I'm experimenting with AI to work on tasks I would otherwise give to a junior as a stop-gap).
Another aspect to consider is that what we used to consider important for software quality may change a lot in light of AI tooling. These things aren't absolutes. I think this is already happening, but it's early days, so I'm not sure what will play out here.
Zanfa · 13h ago
LLMs would also need to use historic commits as context, rather than just the current state of the codebase in isolation. Most codebases I've worked with go through migrations from a legacy pattern A to a newer and better pattern B, used across different parts of the codebase. Rarely can these migrations be done in a single go, so both patterns tend to stick around for a while as old code is revisited. Like the HTTP example, even if LLMs pick up a pattern to follow (which they often don't), it's a coin flip whether they pick the right one or not.
dwd · 12h ago
This...
I once worked on a massive codebase that had survived multiple acquisitions, renames and mergers over a 20 year period. By the time I left it had finally passed into the hands of a Fortune 500 global company.
You would often find code that matched an API call you required that was last updated in the mid-2000s, but there was a good chance that it was not the most recent code for that task, but still existed as it was needed for some bespoke function a single client used.
There could also be similar API calls with no documentation, and you had to pick the one that returned the data fields that you wanted.
antihero · 12h ago
You can craft a nice CLAUDE.md saying write code like this bit, avoid writing code like this legacy bit etc.
manmal · 12h ago
Better to tell them exactly how this and that is done, with some examples.
croes · 10h ago
But that kind of awareness is what vibe coder often lack.
Many didn’t code (much) before.
anshumankmr · 12h ago
That would assume a commit message is implemented correctly, and isn't like "Updated this file" or "Bugfix"
wldlyinaccurate · 12h ago
I think the parent comment means "commits" in the sense of the actual changeset; not just the message.
anshumankmr · 12h ago
That is also problematic, cause a git diff will probably require an exponential gain in context length AND also the ability for the LLM to use said context effectively.
That being said, a context length problem could be potentially be solved but it will take a bit of time, I think Llama4 had 10M context length (not sure if anyone tried prompting it with that much data to see how effective it really is)
tayo42 · 11h ago
Do all of the diffs need to be included? Can't you include like a summarized version of a few changes?
Like I don't memorize the last 20 commits, but I know generally the direction things are going by reading those commits at some point
anshumankmr · 8h ago
If a commit was done a year or so back, then 20 commits would probably prove insufficient, and if say a team member is supposed to use some existing helper method already present in the codebase, which is easier to tell a person to use instead of an LLM writing another function to perform that same operation which is inefficient.
And even if you juiced up a context length of an LLM to astronomical numbers AND made it somehow better at parsing and understanding its context, it will not always repeat said capabilities in other codebases (see for example o3 supposedly being the top of most benchmarks but it will still fumble a simple variation mother-is-a-surgeon puzzle).
I am not saying its impossible for a company to figure this out, but it will be incredibly hard.
subarctic · 13h ago
Setting up a linter, formatter and having a lot of strict type checking are really helpful when using an llm to generate code, just like they are when you start receiving contributions from people who don't know or don't agree with the style rules that you implicitly follow. As are tests. Basically anything automated that helps ensure the right style and behaviour of your code becomes very useful with coding agents as they can just run whatever command you tell them to check for issues and/or fix them if possible.
arduanika · 12h ago
I believe you that these tools help a lot, but they would not prevent ~any of the examples listed in the article (under "The smell of vibe coding").
helloplanets · 9h ago
Most of those look like context issues to me. Repo map (using Tree-sitter, etc) and documentation would already do wonders. Feeding 32-64kTok of context directly into a model like Gemini Pro 2.5 is something that more people should try out in situations like this. Or even 128kTok+.
Cthulhu_ · 11h ago
Probably not, but that's where code review comes in. Which can also partially be done by AI, ironically.
dkdbejwi383 · 11h ago
With cursor at least it recognises linter errors and failing tests and attempts to correct its own problems, with varying levels of success
antihero · 12h ago
This just masks the problem.
philipp-gayret · 10h ago
All major AI assistants already come with ways to not have any of these issues.
Claude Code has /init, Cursor comes with /Generate Cursor Rules, and so on. It's not even context engineering: There are out of the box tools you can use not to have this happen. And even if they do happen: you can make them never happen again, with these same tools, for your entire organization - if you had invested the time to know how to use them.
It is interesting how these tools split up the development community.
ceuk · 9h ago
CC very regularly ignores very explicit stuff in CLAUDE.md for me, and I know I'm not the only one. The cycle of compacting/starting new conversations feels like a sisyphean spiral of the same undesirable behaviour and I've yet to find a satisfactory solution despite a lot of effort to that end.
I don't think it's fair to dismiss this article as a superficial anti-ai knee jerk. The solutions you describe are far from perfect
philipp-gayret · 8h ago
Fair enough. For me compacting conversation always feels a bit weird; I have no way to tell what it effectively deleted from the context but I (very) regularly have it re-read and update CLAUDE.md as part of the process or after "discussions" with the LLM so I would guess that might be why it follows patterns in it a bit stricter for me than for most. It would be nice if the tool took care of that automatically.
woolion · 10h ago
Serious question: I'm currently re-evaluating if Cursor can speed up my daily work. Currently it is not really the case because of the many subtle errors (like switching a ":" for a ","). But mostly the problem I face is that the code base is big, with entirely outdated parts and poorly coded ones. So the AI favors the most common patterns, which are the bad ones. Even with basic instructions like "take inspiration from <part of the code that is very similar and well-written>" it still mostly takes from the overall codebase (which, by the way, was worsened by a big chunk of vibe-coded output that was hastily merged). My understanding is that a rule should essentially do the same as if it is put in the prompt directly. Is there a solution to that?
philipp-gayret · 9h ago
I recently switched to Claude Code, I much prefer it (I end up less in cycles of Cursor getting stuck on problems). Before I used Cursor for some months.
> My understanding is that a rule should essentially do the same as if it is put in the prompt directly. Is there a solution to that?
Yes from my understanding Cursor Rule files are essentially an invisible prefix to every prompt. I had some issues in the past with Cursor not picking up rule files until I restarted it (some glitch, probably gone by now.). So put something simple like a "version" or for your rules file and ask it what version of the rules are we following for this conversation just to validate that the process is working.
For Cursor with larger projects I use a set of larger rule files that always apply. Recently I worked with Spotify's Backstage for example and I had it index online documentation on architecture, build instructions, design, development of certain components, project layout. Easily 500+ lines worth of markdown. I tell Cursor where to look, i.e. online documentation of the libraries you use, reference implementations if you have any, good code examples and why they are good, and then it writes its own rule files - I don't write them manually anymore. That has been working really well for me. If you have a common technology stack you or way of working you can also try throwing in some examples from https://github.com/PatrickJS/awesome-cursorrules
For a codebase containing both good and bad code; maybe you can point it to a past change where code was refactored from bad to good, so it can write out what why you prefer which style and how to manage the migration from bad to good. That said; the tools are not perfect. Even with rules the bad output still can happen but larger rule files describing what you'd like to do and what to avoid makes the chance significantly smaller and the tool more pleasant to work with. I recently switched to Claude Code because Cursor tended to get "stuck" on the same problem which I don't really experience with Claude Code but YMMV.
croes · 10h ago
The issue isn’t in the tool but in the vibe „coder“.
They care like they code: not.
mns · 7h ago
> But people want a good cup of coffee, even if they have to wait a little bit for it.
I think the author is vastly underestimating what the majority of people actually want. It took me a lot to get this, but for many people, quick/cheap will always trump quality.
0_____0 · 7h ago
the ubiquity of keurig machines really does speak to this.
nojs · 12h ago
In my experience pretty much all of these issues stem from a combination of short context windows and suboptimal “context engineering”.
If the agent has a clean, relevant context explaining what global functions are available it tends to use them properly.
The biggest challenge is how to construct the right context for each request, and keep it clean until the feature is finished. I expect we will see a lot of improvements in this area the coming months (sub-agents being an obvious example).
frizlab · 12h ago
> If the agent has a clean, relevant context explaining what global functions are available it tends to use them properly.
STOP! The agent does not exist. There are no agents; only mathematical functions that have an input and produce an output.
Stop anthropomorphizing LLMs, they are not human, they don’t do anything.
It might seem like it does not matter; my take is it’s primordial. Humans are not machines and vice-versa.
johnisgood · 5h ago
We have used the term "agent" in AI for some time.
> The main unifying theme is the idea of an intelligent agent. We define AI as the study of agents that receive percepts from the environment and perform actions.
This is from Artificial Intelligence: A Modern Approach by Stuart J. Russell and Peter Norvig.
Sebalf · 9h ago
Frankly, this take is so reductionistic that it's useless. You can substitute "mathematical functions" with "biochemistry" and apply the exact same argument to human beings.
What I'd like is for people to stop pretending we have any idea what the hidden layer of an LLM is actually doing. We do not know at all. Yes, words like "statistics" and "mathematical functions" can accurately describe the underlying architecture of LLMs, but the actual mechanism of knowledge processing is not understood at all. It is exactly analogous to how we understand quite a lot about how neurons function at the cellular level (but far from everything, seeing as how complicated and opaque nature tends to be), but that we have no idea whatsoever what exactly is happening when a human being is doing a cognitive task.
It is a fallacy to confuse the surface level understanding of how a transformer functions, to the unknown mechanisms that LLMs employ.
atleastoptimal · 13h ago
This is all true. The best way to treat LLM's as they are now is one step above the abstraction offered by compiled languages over assembly. You can describe something in plain english, note its explicit requirements, inputs and outputs, and an LLM can effectively write the code as a translation of the logic you are specifying. Using LLM's, you are best served minimizing the entropy they have to deal with. The transformer is essentially a translation engine, so use it as a translator, not as a generator.
That being said, every few months a new model comes out that is a little less encumbered by the typical flaws of LLM's, a little more "intuitively" smart and less needing of hand-holding, a little more reliable. I feel that this is simply a natural course of evolution, as more money is put into LLM's they get better because they're essentially a giant association machine, and those associations give rise to larger abstractions, more robust conceptions of how to wield the tools of understanding the world, etc. Over time it seems inevitable that providing an LLM any task it will be able to perform that task better than any human programmer given it, and the same will go for the rest of what humans do.
nickm12 · 12h ago
This is a false analogy. LLMs do not "compile" natural language to high level code in the same way that a compiler or interpreter implements a high-level programming language in terms of machine instructions (or, for that matter, how a CPU implements machine instructions in hardware).
Programming and machine languages aim for a precise and unambiguous semantics, such that it's meaningful to talk about things like whether the semantics are actually precise or whether the compiler has a bug in failing to implement the spec.
Natural language is not just a higher level of abstraction on our existing stack. If a new model comes out, or you even run an existing model with a new seed, you can get different code out that behaves differently. This is not how compilers work.
It doesn't "care" about the algorithm that produced that list of results, only that it fits the approximation of how the algorithm works as defined by the schema. There are thousands of ways the engine could have been implemented to produce the schema that returns relevance-based results from a web-crawler-sourced database.
In the same way, if I prompt an LLM "design a schema with [list of requirements] that works in [code context and API calls]", there are thousands of ways it could produce that code, but within a margin of error a high quality LLM should be able to produce the code that fits those requirements.
Of course the difference is that there is a stochastic element to LLM generated code. However it is useful to think of LLM's this way because it allows being able to leverage their probability of being correct, even if they aren't as precise as calling APIs but being explicit in how those abstractions are used.
devnullbrain · 11h ago
This is a false interpretation, you've put "compile" in quotes when it doesn't appear in the parent comment and the actual phrasing used is more correct.
voxl · 13h ago
No, LLMs are not an "abstraction" like a compiler is. This is bullshit. LLMs are stochastic token generators. I have NEVER met someone in real life that has produced something I wouldn't throw in the trash using LLMs, and I have had the displeasure of eating cookies baked from an LLM recipe.
No, LLMs will not get better. The singularity bullshit has been active since 2010s. LLMs have consumed the entire fucking Internet and are still useless. Where the fuck is the rest of the data going to come from? All these emails from people wanting high quality data from PhDs only for them to be scammy. People only want to train these things on easily stolen garbage, not quality input, because quality is expensive. Go figure!
This optimistic horeshit hype is embarrassing.
atleastoptimal · 13h ago
>No, LLMs will not get better.
What makes you so sure of this? They've been getting better like clockwork every few months for the past 5 years.
bigstrat2003 · 12h ago
I don't claim that they won't get better, but they certainly haven't gotten better. From the original release of ChatGPT to now, they still suck in the same exact ways.
johnisgood · 5h ago
I don't think they have gotten better either (at least in the past 1 year), because I remember how much better ChatGPT or even Claude used to be before. Perhaps they are nerfed now for commercial use, who knows.
otabdeveloper4 · 12h ago
No they haven't.
The hallucinate exactly as much as they did five years ago.
atleastoptimal · 12h ago
Absolutely untrue. Claiming GPT-3 hallucinates as much as o3 over the same token horizon on the same prompts is a silly notion and easily disproven by the dozens of benchmarks. You can code a complete web-app with models now, something far beyond the means of models so long ago.
otabdeveloper4 · 11h ago
> caveats and weasel words
> "benchmarks"
Stop drinking the coolaid and making excuses for LLM limitations, and learn to use the tools properly given their limits instead.
antihero · 12h ago
They really don’t though.
otabdeveloper4 · 11h ago
Larger context lengths are awesome, but they don't fundamentally change the failure modes of LLMs.
anshumankmr · 12h ago
> LLMs have consumed the entire fucking Internet and are still useless.
They aren't useless. Otherwise, ChatGPT would have died a long time back
> Where the fuck is the rest of the data going to come from?
Good question. Personally, I think companies will start paying more for high quality data or what is at least perceived as high quality data.
I think Reddit and some other social media companies like it are poised to reap the rewards of this.
Whether this will be effective in the long run remains to be seen.
misnome · 10h ago
> They aren't useless. Otherwise, ChatGPT would have died a long time back
Isn’t the entire industry being fuelled by orders of magnitude more VC funding than revenue?
anshumankmr · 10h ago
>Isn’t the entire industry being fuelled by orders of magnitude more VC funding than revenue?
Because people want to use it, right? And it is a matter of time before they start limiting the ChatGPT "free" or "logged out" accounts, I feel. In the consumer AI chat apps, it is still the dominant brand, at least in my anecdotal experience, and they will basically make the Plus version the one version of the app to definitely use.
Plus they are planning on selling it to enterprises, and at least a couple of them are signing up for sure.
johnisgood · 5h ago
I think they are already limiting / nerfing "free" vs "logged out" vs "paid" vs "non-commercial".
arthens · 10h ago
Isn't that an argument against the sustainability of the LLM business model rather than their usefulness?
People use them because they are useful, not because they are VC funded.
skydhash · 5h ago
When the product is free, that put the barrier at ground level. I have more confidence in Kagi userbase, than OpenAI’s.
Z37K · 3h ago
I know human coders who do all of things with or without AI. Everyone's output has increased but tendencies still depend on the operator.
agile-gift0262 · 11h ago
I don't think that it's people no caring. I think many (most?) of us are biased to accept what we already have that's working. I noticed that of myself when I tried programming with an LLM "agent". After all the fuzzing around, many iterations of novels I had to write as many prompts, once the LLM produced something that worked, I had to fight with my instinct of just pushing that for review.
I also noticed that the time I had to spend on reviews from some of my colleagues increased by 9 times (time tracked). So I don't know how much faster they are being at producing that code, but I think it's taking longer overall to get that ticket closed.
CompoundEyes · 7h ago
I’ve been working with Claude Code subagents for the first time this week and I have one that purposefully goes through code analyzing it for human maintainability, over-architecting, too many clever one liner tricks, meaningful variable names, ease of debugging, cognitive load, class sizes and so on. It had some interesting analysis about a few of the classes being heavy in requiring domain knowledge and suggested documentation / onboarding for that particular area. I’m interested to see where this goes. The code will be in production this aspect is important to me. Gold plating is in reach with LLMs why not?
ashutosh-mishra · 4h ago
You can generate code without understanding it, but you can't generate judgment, taste, or the kind of long-term thinking that prevents technical debt from metastasizing.
Those come from giving a shit about the work itself, not just shipping it.
ok123456 · 5h ago
> No one would write a class when we’re using a functional approach everywhere.
They're not orthogonal. Closures and classes are dual forms of the same thing. There are cases where one is better than the other for a given problem.
Kim_Bruning · 6h ago
You can take my "EngliSH"[1] when you pry it from my cold dead hands.
But just because it's a powerful way to work, doesn't mean you get to be irresponsible with it! (Quite the opposite: Think table-saw)
[1] English SHell (Claude Code in my case), who says I need to be Bourne Again?
amelius · 8h ago
From keyboard sound I can tell the difference between my colleague typing code and him typing prose.
Since AI he is typing more prose.
sreekanth850 · 7h ago
You can provide proper context, what is required, what is not required, patterns, structure, and ask to add comments too. So its not about vibe coding, its about common sense and involvement.
thenoblesunfish · 12h ago
The author is undermining their own point by calling the vibe code "maintainable", when the whole point of the post is that it's not.
manmal · 12h ago
The main gripe seems to be duplication of semantics across the codebase, and loss of centralized configurability. Makes sense, since LLMs can’t fit a whole codebase into their context and are not aware of shared behavior unless you tell them it exists.
raggi · 13h ago
This is a story of a bad LLM user (in this context), which is perhaps implied by “vibe coding” but folks should be aiming higher. Making people review slop is lazy, rude and disrespectful.
its-kostya · 7h ago
You just claimed reviewing slop is not pleasant (and I agree). So the LLM user has to review the slop, and then the team has to review the iterated-until-not-slop final commit(s). Where in this scenario does anyone _write_ code - the most enjoyable part of the job? Job satisfaction is going to plummet (mine has) and the next generation of software engs is going to be incompetent. Just look at the generation that grew up with ipads/phones. Many of them do not know what a "file" is in college (source, colleagues who are professors).
raduan · 9h ago
I think people should care more about the adherence of code to a specific spec.
And by definition of this, you should care about the spec.
How code looks like, doesn't matter that much, as long it adheres to the spec.
nesk_ · 11h ago
> we still need to build software, not productionize prototypes.
Can't agree more.
kazinator · 12h ago
> I don’t care how the code got in your IDE.
The legal department may have a different idea there.
occz · 12h ago
I've noticed this as well.
I've also noticed that the effort to de-slop the shit-code is quite significant, and many times eats the productivity gains of having the LLM generate the code.
These problems read like they are a product of outdated ai assisted practices.
I can imagine stuff like this happening when copy pasting from/to ai online chat interfaces, but not in a properly initialized project.
The agent will read all the crappy, partly outdated documentation all over the project and also take the reality of the project into consideration.
It's probably a good idea to also let it rewrite the programmer facing docs. Who else is going to maintain that realistically?
scotty79 · 10h ago
> I know the code was generated because it was written in a way no developer on the team would.
> It works, it’s clear, it’s tested, and it’s maintainable.
It would be super funny if he ended his blogpost there.
clauderoux · 10h ago
I love how so many people are eager to criticize LLM code, when in fact, according to my experience it is pretty superior to anything I have seen produced by human programmers, most of the time. It is documented, the code is explained at each step of its creation, and it is pretty readable when you dig into it. I have 30 years of experience in coding, and I have been playing with these LLM for 3 years. Yeah!!! Of course, sometimes they produce very bad code. But in average, the code they produce is largely on par with my fellow humans. And since, they produce the whole explanation of it, it takes a couple of minutes to understand it. And if you don't understand the main points of the code, the LLM will tell you all about it. When did you have a colleague that was eager to explain his/her code to you??? When did you have a colleague that did produce a code you could understand in a few minute??? I really think these tools are quite useful, no need to wrap yourself into the mantel of expertise and look down on these LLM, because sometimes they will produce a code you don't like.
lvturner · 9h ago
I want to care, but I also want money to eat, and the guy that isn't caring and doing a fifth as much work as me, is making more than me and getting promoted ahead of me.
Welcome to enterprise, it's not shit because people don't care. People don't care because they are incentivised not to, and those that do care, burn out.
chilldsgn · 12h ago
heck, I'm sitting in a team with a code base so full of slop that was written by humans, the AI can't even fix it and I'm burned out from trying to make it better. and I get told to be quiet about code standards because the team is still learning.
scrapheap · 10h ago
I hope you responded with "The team is still learning, which is exactly why I shouldn't be quiet about code standards."
globular-toast · 12h ago
This matches my experience. And I do mind the unnecessary comments. I always tell juniors that every line is important and the presence of a comment is a powerful tool for communicating with future developers. Think of it like labelling circuits on a fusebox. Now these garbage code generators are putting comments on everything because they don't understand anything and can't tell what it's redundant.
But this is only scratching the surface of what's wrong, as the article elaborates.
The thing is people claim these things are making them faster. I don't believe it. What I believe is they are faster at generating shit. I know that because a baby can coax an LLM into producing shit too.
I do not believe you can spend that much time writing the correct prompt - use this exact function, follow this pattern, add a comment here, don't add one there, no, not like that - and still be quicker than just writing it yourself directly in the language.
It's like if I speak French fluently but only communicate through a translator that I instruct in English but constantly have to correct when they miss the nuance in my speech. I'd just speak French!
So, no, I don't believe it.
What I believe is that many, many software developers have been manually writing boilerplate, repetitive and boring code over and over again up until this point. I believe it because I've seen it. LLMs will obviously speed this up. But some of us already learnt how to use the computer to do that for us.
What I also believe is developers exist who don't understand, or care to understand, what they are doing. They will code using a trial and error approach and find solutions based purely on perceived behaviour of the software. I believe it because I've seen it. Of course LLMs will speed up this process. But some of us actually think about what we're writing, just like we don't just randomly string together words in a restaurant and then just keep trying until we get the dish we want.
lofaszvanitt · 6h ago
The article has the unique linkedin smell to it.
imiric · 12h ago
> I want people to care about quality, I want them to care about consistency, I want them to care about the long-term effects of their work.
Yeah, that's not happening.
LLMs enable masses of non-technical people to create and publish software. They enable scammers and grifters who previously would've used a web site builder to also publish native and mobile apps, all in a fraction of the time and effort previously required. They enable experienced software developers to cut corners and automate parts of the job they never enjoyed to begin with. It's remarkable to me that many people who have been working in this industry for years don't enjoy the process of creating software, and find many tasks to be a "chore".
A small part of this group can't identify quality even if they cared about it. The rest simply doesn't care, and never will. Their priorities are to produce something that works on the surface with the least amount of effort, or, in the case of scammers, to produce whatever can bring them the most revenue as quickly and cheaply as possible. LLMs are a perfect fit for both use cases.
Software developers who care about quality are now even a smaller minority than before. They can also find LLMs to be useful, but not the magical productivity booster that everyone else is so excited about. If anything their work has become more difficult, since they now need to review the mountains of code thrown at them. Producing thousands of lines of code is easy. Ensuring it's high quality is much more difficult.
dankobgd · 9h ago
nobody cares about anything in this industry
jongjong · 9h ago
I like vibe coding. I've always hated bloated code bases with too many files, too many unnecessary abstractions, too many unnecessary utility functions... The AI is now dishing out punishment; giving people a taste of their own medicine. Complex codebases lead to complex buggy LLM code completions.
If I come across a fugly code base, I don't bother reading it, I just ask Claude what it's doing and I ask Claude to fix it. To me this is a huge advantage because my OCD prevented me from producing fugly code by hand but now I wield Claude like an automatic complexity gun.
Producing that kind of complexity when you know there exists a simpler way is demoralizing, but it's not demoralizing when an LLM does it because it's so low effort.
I just hated thinking about all this mind-numbing nonsense.
parpfish · 13h ago
> No one would write a class when we’re using a functional approach everywhere.
do people really think functional coding shouldn't involve writing classes?
i can't imagine writing what i think of as code in a "functional programming style" without tons of dataclasses to describe different immutable records that get passed around. and if you're feeling fancy, add some custom constructors to those dataclasses for easy type conversions.
williamstein · 13h ago
In the context of react codebases, people typically use either functional components everywhere or class components. Mixing the two approaches is frustrating and painful, eg you can’t use any of the hooks in your codebase in a class component. Perhaps the OP was talking about this, but just being really vague.
bublyboi · 13h ago
I think he’s referring to classes vs functions in React
> No one would implement a bunch of utility functions that we already have in a different module.
> No one would change a global configuration when there’s a mechanism to do it on a module level.
> No one would write a class when we’re using a functional approach everywhere.
Boy I'd like to work on whatever teams this guy's worked on. People absolutely do all those things.
New people contributing usually reinvent many things and change global configuration because they don't know they can use something already there.
Ironically indexing the codebase and ask LLM questions about specific things is the best thing you can do. Because the only 3 people who you can ask left the project or are busy and will reply within a week.
The most common programmer's footgun
We constantly say we don't have time to document the code. So instead we spend all our time reading code and trying to figure out what it does, to the minimal amount of understanding needed to implement whatever thing we need to implement.This, of course, itself is naive because you can't know what the minimal necessary information is without knowing something about the whole codebase. Which is also why institutional knowledge is so important and why it is also weird that we'd rather have pay raises through switching companies than through internal raises. That's like trying to fix the damage from the footgun with a footgun.
I’m not arguing about your personal experience but these things are not absolutes.
The key thing is can a new developer jump in and understand the thing. Add enough docs until they facilitate this understanding as well as possible. Then stop documenting and point the reader to the code.
My point is that everyone is different. Documentation isn't just for developers and you never know who's going to contribute. It is also beneficial to have multiple formats just because even with a single person different ways make more sense on one day than the next. Having different vantage points is good to have. It is also good to practice your own mental flexibility[0]
I think the pytorch docs are a good example here. Check out the linalg module[1] (maybe skip the matrix properties section).
[0] This will also help you in the workplace to better communicate with others as well as makes you a better problem solver. Helps you better generalize ideas.
[1] https://docs.pytorch.org/docs/stable/linalg.html
When revisiting code is the best time to add comments because then you will find out what is tricky and what is obvious.
Code reviews are also good for adding code comments. If the people reviewing are doing their job and are actually trying to understand the code then it is a good time to get feedback where to add comments.
Your first "docs" are your initial sketch. The pieces of paper, whiteboard, or whatever you used to formulate your design. I then usually write code "3" times. The first is the hack time. If in a scripting language like python, test your functions in the interpreter, isolated. Then "write" 2 is bringing into the code, and it is a good idea to add comments here. You'll usually catch some small things here. Write the docstrings now, which is your 2nd docs and your first "official" ones. While writing those I usually realize some ways I can make my code better. If in a rush, I write these down inside the docstring with a "TODO". When not rushing I'll do my 3rd "write" and make those improvements (realistically this is usually doing some and leaving TODOs).
This isn't full documentation, but at least what I'd call "developer docs". The reason I do things this way is that it helps me stay in the flow state, but allows me to move relatively fast while minimizing tech debt. It is always best to write docs while everything is fresh in your mind. What's obvious today isn't always obvious tomorrow. Hell, it isn't always obvious after lunch! This method also helps remind me to keep my code flexible and containerize functions.
Then code reviews help you see other viewpoints and things you possibly missed. You can build a culture here where during review TODOs and other similar things can be added to internal docs so even if triaged the knowledge isn't completely lost.
Method isn't immutable though. You have to adapt to the situation at the time, but I think this is a good guideline. It probably sounds more cumbersome than it is, but I promise that second and third write are very cheap[0]. It just sounds like a lot because I'm mentioning every step[1]
[0] Even though I use vim, you can run code that's in the working file, like cells. So "write 2" kinda disappears, but you still have to do the cleanup here so that's "write 2"
[1] Flossing your teeth also sounds like a lot of work if you break it down into all subtasks 1) find floss, 2) reach for floss, 3) open floss container, ...
It's easy to hear "let's slow down a little" as "don't move fast" but it's wrong to interpret that because "slow down" is relative. There is such a thing as "too fast". You want to hear "slow down" just as much as you want you great calls to speed up. When you hear both you should be riding that line of fast but not too fast. It's also good to make sure you have a clear direction. No use in getting nowhere faster.
I'll use another visual analogy. Let's say you and I have a race around the world. I take off running, you move to the drawing board. I laugh as I'm miles ahead, and you go to your workshop, which is in the other direction. The news picks up our little race, and laughs at you as I have such a tremendous lead. I'm half way done, but you come out of your workshop having build a jet. I only get a few more miles before you win the race. The news then laughs at my stupidity and your cleverness, as if it was so obvious all along.
Sometimes to move fast you need to slow down. It takes lots of planning and strategizing to move at extraordinary speeds.
How does the saying go, something like “show me the incentives and I’ll show you the outcome?”
> That's like trying to fix the damage from the footgun with a footgun.
If you value your money/time/etc, wouldn't the best way to fix the damage from footguns be by preventing the damage to you in the first place by not being there if/when it goes off?
I think your point is well put, I’m just trying to follow your reasoning to a conclusion logical to me, though I don't know if mine is the most helpful framing. I didn’t pick the footgun metaphor, but it is a somewhat useful model here for explaining why people may act the way they do.
So the question becomes: is no documentation better or documentation that can be - potentially - entirely out of date, misleading or subtly wrong, because eg they documented the desired behavior vs actual behavior (or vice versa).
I'm generally pro documentation, I'm just fully aware that internal documentation the devs need to write themselves and for themselves... Very rarely gets treated with enough respect to be trustworthy.
So what it comes down to is one person spearheading the efforts for docs while the rest of the team constantly "forgets" it, until they decide it's not worth the effort as soon as the driving force either changes teams or gave up themself.
Not if you generate reference docs from code and how-to docs from tests.
Knowledge transfer through technical writing doesn’t always manifest itself if it isn’t part of the work process at the time you have that in your mental context. It’s hard to have that context to write the docs if you’re context switching from working on something else or not involved at that level, so it’s hard to just drop in to add docs if there isn’t some framework for writing ad hoc docs for someone to fix up later.
I don’t have experience at traditional employers though so I can’t speak authoritatively here. I’m used to contracts and individual folks and small business. Having human readable documents is important to me because I’m used to folks having things explained on their level, which requires teaching only what they need and want to know to get their work done. Some folks don’t even know what they need when they ask me for help, so that’s its own process of discovery and of documentation. I’m used to having to go to them where they are and where the issue is, so there was no typical day at the office or out of it. Whatever couldn’t fit through the door, I had to go to myself. I’ve had to preserve evidence of potential criminal wrongdoing and document our process. It taught me to keep notes and to write as I work.
I think most places do have some kind of process for doing this, and I suspect the friction in doing the thing is part of the issue, and the fact that it’s difficult thankless work that doesn’t show up on most tracked metrics is part of the issue.
If docs were mandated they would get done. If someone’s job was to make sure they were done well, that would help. I guess folks could step up and try to make that happen and that might be what it takes to make that happen.
Documentation does not help beyond a point. Nobody reads the documentation repeatedly, which would be needed.
When you keep working on a project, and you need a new function, you would need to check or remember every single time that such a function already exists or might exist somewhere. You may have found it when you read the docs months ago, but since you had no need for that function at the time your brain just dismissed it and tossed that knowledge out.
For example, I had a well-documented utils/ folder with just a few useful modules, but they kept getting reimplemented by various programmers. I did not fault them, they would have had to remember every single time they needed some utility to first check that folder. All while keeping up that diligence forever, and while working on a number of projects. It is just too hard. Most of the time you would not find what you need, so most of the time that extra check would be a waste. Even the most diligent person would at some point reimplement something that already exists, no matter how well-documented it is. It's about that extra search step itself.
The closer you want 100% perfection you get exponentially increasing effort. So we have some duplication, not a big deal. Overall architectural quality is more important than squeezing out those last not really important few percent of perfection.
That being said, good documentation is worth its weight in gold and supports the overall health and quality of a codebase/project. Open-source projects that succeed often seem to have unusually strong, disciplined documentation practices. Maybe that's just a by-product of engineering discipline, but I don't think it is -- at least not entirely.
Assuming they even have code reviews - in your experience, in a situation where the person writing the code didn't check if it already exists, the reviewer will check that and then tell them to delete their already finished implementation and use that existing thing?
Claude code’s Plan mode kind of does this research before coding - but tbf the Search tool seemingly fails half the time with 0 results and it gets confused and then reimplements too…
The key is that we all have an intuitive sense that this behavior is wrong - building a project means working within the established patterns of that project, or at least being aware of them! Going off half-cocked and building a solution without considering the context is extremely bad form.
In the case of human developers, this can be fixed on the code review level, encouraging a culture of reading not just writing code. Without proper guardrails, they can create code that's dissonant with the existing project.
In the case of LLMs, the only recourse is context engineering. You need to make everything explicit. You need to teach the LLM all the patterns that matter. Their responses will always be probabilistic token salad, by definition. Without proper guardrails, it will create code that's dissonant with the existing project.
Either way, it's a question of subjective values. The patterns that are important need to be articulated, otherwise you get token salad randomly sampling the solution space.
I think soon enough we'll have a decent LLM that's capable of reviewing ALL changes to ensure they follow the "culture" we expect to see.
I'd be extremely careful about applying this thinking anywhere else. There's enough baseless finger-pointing in academia and arts already.
Humm.
Maybe if we say that this is not an issue from vibe coding it wont be?
Maybe if we pretend that maybe a naive junior would make these mistakes (true) we should be happy to accept them from senior developers (false)?
LLMs are extraordinarily bad at doing these things.
I’ve seen it.
You've seen it.
The OP has seen it.
You’re in a rush so you wrote some classes in a code base in a language which supports classes but has no classes in it?
Really? Did that get past code review before? Did you deliberately put up a code review that you knew would be rejected and take longer to merge as a result because you were in a hurry?
Of course not.
You did the bare minimum that still met the basic quality standards expected of you.
I get it. We all get it. When youre in a rush you cut corners to move faster.
…but thats not what the OP is talking about, and its not what I see either:
Its people putting up AI slop and not caring at all what the content was.
Just a quick check it compiled and the tests pass if youre lucky.
Too lazy even put a “dont use classes” in their cursor rules file.
Come on. The OP isnt saying dont use AI.
Theyre saying care, just a little bit about your craft ffs.
If speed is all that matters then you need exponential output, as your output needs to offset the debt. If speed is a factor but isn't the only factor, then you need to weigh it against the other things. Take on debt wisely and pay it off when you can. But it does seem like there's a trend to just take on as much debt and hope for the best. Last I checked, most people aren't really good at handling debt.
Not everything that is not perfect is Tech Debt, some of it is just pragmatism. If you end up with two methods doing the same thing, who cares? As long as they are both correct, they cost nothing, might never need any maintenance attention and will never be paid down before the codebase is replaced in 10 years time.
Same with people writing code in a different style to others. If it is unreadable, that isn't tech debt either, it's just a lack of process or lack of someone following the process. Shouldn't be merged = no tech debt.
Adding some code to check edge cases that are already handled elsewhere. Again, who cares? If the code make it unreadable, delete it if you know it isn't needed, it only took 10 seconds to generate. If it stays in place and is understandable, it's not tech debt. Again, not going to pay it down, it doesn't cost anything and worse case is you change one validation and not the other and a test fails, shouldn't take long to find the problem.
Tech debt is specifically borrowing against the right way to do something in order to speed up delivery but knowing that either the code will need updating later to cope with future requirements or that it is definitely not done in a reliable/performant/safe way and almost certainly will need visiting again.
No need for perfection. Perfection doesn't exist in code. The environment is constantly moving, so all code needs to eventually be maintained.
But I also want to be very very clear here. Just because two functions have the same output doesn't mean that they're the same and no one should care. I'll reference Knuth's premature optimization here. You grab a profiler and find the bottleneck in the code and it's written with a function that's O(n^3) but can be written in O(n log n). Who cares? The customer cares. Or maybe your manager who's budgeting that AWS bill does. You're right that they're both logically "correct" but it's not what you want in your code.
Similarly, code that is held together with spaghetti and duct tape is tech debt. It runs. It gives the correct output. But it is brittle, hard to figure out what it does (in context), and will likely rot. "There's nothing more permanent than a temporary fix that works ", as the saying goes. I guess I'll also include the saying "why is there never time to do things right but there's always time to do things twice?"
Code can be broken in many ways. Both of those situations have real costs. Costs in terms of both time and money. It's naïve to think that the only way code can be broken is by not passing tests. It's naïve to think you've tested everything that needs to be tested. Idk about you, but when I code I learn more about the problem, often with the design changing. Most people I know code this way. Which is why it is always good to write flexible code, because the only thing you can rely on with high confidence is that it's going to change
Thing that many people do without even realizing they are incurring in tech debt. This kind of developers are the one that will just generate more tech debt with an LLM in their hands (at least now).
That said, tech debt isn't paid by developers individually, it's paid by organizations in developers time. Only in rare cases can you make a deliberate decision for it, as it grows organically within any project. For example, most python2 code today that used niche libraries with outdated docs that have been taken offline in the meantime has to be considered expensive tech debt nowadays.
Critical solutions, but small(er) projects with 2-4 devs, that’s where it’s at. I feel like it’s because then it’s actually possible to build a devteam culture and consensus that has the wanted balance of quality and deliveryspeed.
Linters can also help quite a bit. In the end, you either have your rules enforced programmatically or by a human in review.
I think it’s a very different (and so far, for me, uncomfortable) way of working, but I think there can be benefits especially as tooling improves
Coding agents come with a lot of good behavior built in.
Like "planning mode" where they create a strong picture of what's to be made before touching files. This has honestly improved my workflow at programming from wanting to jump into prototyping before I even have a clear idea, to being very spec-oriented: Of course there needs to be a plan, especially when it will be drafted for me in seconds.
But the amount of preventable dumb things coding agents will do that need to be explicitly stated and meticulously repeated in their contexts reveals how simply training on the world's knowledge does not capture senior software engineer workflows entirely, and captures a lot of human averageness that is frowned upon.
All the models I’ve used (yes, including all the biggest, newest, smartest ones) follow the binary rule about 75% of the time at the very most. Usually closer to 50% on average, with odds significantly decreasing the longer the context increases as it occurs at the end of a task but other than that seems to have no predictable pattern.
The fuzzier rule is slightly better, I’m guessing because it applies earlier in the context window, at around 80% compliance and uses lots of caps and emphasis. This one has a more predictable failure mode of the ratio of reading code vs thinking/troubleshooting/time the model is “in its own head”. When mostly reading code or my instructions compliance is very high, when doing extended troubleshooting or anything that starts to veer away from the project itself into training data it is much lower.
So it’s hit and miss and does help but definitely not something I’d rely on as a hard guardrail, like not executing commands, which Roo has a non-LLM tool config to control. So over time I hope agentic runners add more detetministic config outside the model itself, because instructions still aren't as reliable as they should be and don't seem to be getting substantially better in real use.
to be fair on this one, and while I don't flat out disagree, lots of people reinvent utility functions simply because they don't know they exist elsewhere, especially on huge code bases. This seems to get mostly rectified within the PRs, when a senior dev comments on it - the problem then is, you've only increased the number of people who now know by 1.
Also, I make it work the same way I do: I first come up with the data model until it "works" in my head, before writing any "code" to deal with it. Again, clear instructions.
Oh another thing, one of my "golden rule" is that it needs to keep a block comment at the top of the file to describe what's going on in that file. It acts as a second "prompt" when I restart a session.
It works pretty well, it doesn't appear as "magic" as the "make it so!" approach people think they can get away with, but it works for me.
But yes, I still also spend maybe 30% of the time cleaning up, renaming stuff and do more general rework of the code before it comes "presentable" but it still allows to work pretty quickly, a lot quicker than if I were to do it all by hand.
I have ended up thinking about it as a "hunting dog". It can do some things better than me. It can get into tiny crevasses and bushes. It doesn't mind getting wet or dirty. It will smell the prey better than me.
But I should make the kill. And I should be leading the hunt, not the other way around.
Having a junior programmer assistant who never gets better sounds like hell.
Or maybe this is it. Who knows.
But I agree completely some juniors are a pleasure to see bloom, it's nice when one day you see their eye shine and "wow this is so cool, never realized you made that like THAT for THAT reason" :-)
As for the last part, I've recently been getting close to 50 and my eyes aren't what they used to be. In order to fight off eye-strain I now have to tightly ration whatever I do into 20 minute blocks, before having to take appropriate breaks etc.
As a result of that time has become one of the biggest factors for me. An LLM can output code 1000x faster than a human, so if I can wrangle it somehow to do whatever basics for me then its a huge bonus. At the moment I'm busy generating appropriate struct of arrays for SIMD from input AoS structs, and I'm using Unity C# with LINQ to output the text (I need it to be editable by anyone, so I didn't want to go down the Roslyn or T4 route).
The queries are relatively simple, take the list of data elements and select the correct entries, then take whatever fields and construct strings with them. Even so, copying/editing them takes a lot longer than me telling GPT to select this, exclude that and make the string look like ABC.
I think there was a post yesterday about AI's as HUDs, and that makes a lot of sense to me. We don't need an all-powerful model that can write the whole program, what we need is a super-powered assistant that can write and refactor on a very small and local scale.
But sometimes I wonder if pushing a +400.000 lines PR to an open-source project in a programming language that I don't understand is more beneficial to my career than being honest and quality-driven. In the same way that YoE takes precedence over actual skill in hiring at most companies.
You might get the same in Stack Overflow too, but more likely I’ve found either no response or, or someone pretty competent actually does come out of the woodworks.
More work up front and some work after, but still saves time and brain power vs doing it all myself or letting it vibe out some garbage.
But then it's not vibe coding anymore :)
Very often it comes down to HR issues in the end, so you end up having to take that code anyway, and either sneakily revert it or secretly rework it...
We found it mostly starts to abandon instructions when the context gets too polluted. Subagents really help address that by not loading the top context with the content of all your files.
Another tip: give it feedback as PR comments and have it read them with the gh CLI. This is faster than hand editing the code yourself a lot of times. While it cleans up its own work you can be doing something else.
The question then is: do the bad developers improve by vibe coding, or are they stuck in a local optimum?
If we want to be more precise, I think the main issue is that the AI-generated code lacks a clear architecture. It has no (or very little) respect for overall information flow, and single-responsibility principle.
Since the AI wants you to have "safe" code, so it will catch things and return non-results instead. In practice, that means the calling code has to inspect the result to see if it's a placeholder or not, instead of being confident because you'd get an exception otherwise.
Similarly, to avoid problems the AI might tweak some parameter. If for example you were to design an program to process something with AI, you might to gather_parameters -> call -> process_results. Call should not try to do funky things with parameters because that should be fixed at the gathering step. But locally the AI is always going to suggest having a bunch of "if this parameter is not good, swap it silently so that it can go through anyway".
Then tests are such a problem it would require an even longer explanation...
The developer can do whatever they want, but at the end, what I review is their code. If that code is bad, it is the developer's responsibility. No amount of "the agent did it" matters to me. If the code written by the agent requires heavy refactoring, then the developer has to do it, period.
However, you'll probably get an angry answer that it's management fault, or something of the sort, that is to blame (because there isn't enough time). Responsibility would have to be taken up before in pushing back if some objectives truly are not reasonable.
The common counter-argument here is that you miss out on training juniors, which is true, but it's not always an option (we are really struggling to hire at my startup, for instance, so I'm experimenting with AI to work on tasks I would otherwise give to a junior as a stop-gap).
Another aspect to consider is that what we used to consider important for software quality may change a lot in light of AI tooling. These things aren't absolutes. I think this is already happening, but it's early days, so I'm not sure what will play out here.
I once worked on a massive codebase that had survived multiple acquisitions, renames and mergers over a 20 year period. By the time I left it had finally passed into the hands of a Fortune 500 global company.
You would often find code that matched an API call you required that was last updated in the mid-2000s, but there was a good chance that it was not the most recent code for that task, but still existed as it was needed for some bespoke function a single client used.
There could also be similar API calls with no documentation, and you had to pick the one that returned the data fields that you wanted.
Many didn’t code (much) before.
That being said, a context length problem could be potentially be solved but it will take a bit of time, I think Llama4 had 10M context length (not sure if anyone tried prompting it with that much data to see how effective it really is)
Like I don't memorize the last 20 commits, but I know generally the direction things are going by reading those commits at some point
And even if you juiced up a context length of an LLM to astronomical numbers AND made it somehow better at parsing and understanding its context, it will not always repeat said capabilities in other codebases (see for example o3 supposedly being the top of most benchmarks but it will still fumble a simple variation mother-is-a-surgeon puzzle).
I am not saying its impossible for a company to figure this out, but it will be incredibly hard.
Claude Code has /init, Cursor comes with /Generate Cursor Rules, and so on. It's not even context engineering: There are out of the box tools you can use not to have this happen. And even if they do happen: you can make them never happen again, with these same tools, for your entire organization - if you had invested the time to know how to use them.
It is interesting how these tools split up the development community.
I don't think it's fair to dismiss this article as a superficial anti-ai knee jerk. The solutions you describe are far from perfect
> My understanding is that a rule should essentially do the same as if it is put in the prompt directly. Is there a solution to that?
Yes from my understanding Cursor Rule files are essentially an invisible prefix to every prompt. I had some issues in the past with Cursor not picking up rule files until I restarted it (some glitch, probably gone by now.). So put something simple like a "version" or for your rules file and ask it what version of the rules are we following for this conversation just to validate that the process is working.
For Cursor with larger projects I use a set of larger rule files that always apply. Recently I worked with Spotify's Backstage for example and I had it index online documentation on architecture, build instructions, design, development of certain components, project layout. Easily 500+ lines worth of markdown. I tell Cursor where to look, i.e. online documentation of the libraries you use, reference implementations if you have any, good code examples and why they are good, and then it writes its own rule files - I don't write them manually anymore. That has been working really well for me. If you have a common technology stack you or way of working you can also try throwing in some examples from https://github.com/PatrickJS/awesome-cursorrules
For a codebase containing both good and bad code; maybe you can point it to a past change where code was refactored from bad to good, so it can write out what why you prefer which style and how to manage the migration from bad to good. That said; the tools are not perfect. Even with rules the bad output still can happen but larger rule files describing what you'd like to do and what to avoid makes the chance significantly smaller and the tool more pleasant to work with. I recently switched to Claude Code because Cursor tended to get "stuck" on the same problem which I don't really experience with Claude Code but YMMV.
They care like they code: not.
I think the author is vastly underestimating what the majority of people actually want. It took me a lot to get this, but for many people, quick/cheap will always trump quality.
If the agent has a clean, relevant context explaining what global functions are available it tends to use them properly.
The biggest challenge is how to construct the right context for each request, and keep it clean until the feature is finished. I expect we will see a lot of improvements in this area the coming months (sub-agents being an obvious example).
STOP! The agent does not exist. There are no agents; only mathematical functions that have an input and produce an output.
Stop anthropomorphizing LLMs, they are not human, they don’t do anything.
It might seem like it does not matter; my take is it’s primordial. Humans are not machines and vice-versa.
> The main unifying theme is the idea of an intelligent agent. We define AI as the study of agents that receive percepts from the environment and perform actions.
This is from Artificial Intelligence: A Modern Approach by Stuart J. Russell and Peter Norvig.
What I'd like is for people to stop pretending we have any idea what the hidden layer of an LLM is actually doing. We do not know at all. Yes, words like "statistics" and "mathematical functions" can accurately describe the underlying architecture of LLMs, but the actual mechanism of knowledge processing is not understood at all. It is exactly analogous to how we understand quite a lot about how neurons function at the cellular level (but far from everything, seeing as how complicated and opaque nature tends to be), but that we have no idea whatsoever what exactly is happening when a human being is doing a cognitive task.
It is a fallacy to confuse the surface level understanding of how a transformer functions, to the unknown mechanisms that LLMs employ.
That being said, every few months a new model comes out that is a little less encumbered by the typical flaws of LLM's, a little more "intuitively" smart and less needing of hand-holding, a little more reliable. I feel that this is simply a natural course of evolution, as more money is put into LLM's they get better because they're essentially a giant association machine, and those associations give rise to larger abstractions, more robust conceptions of how to wield the tools of understanding the world, etc. Over time it seems inevitable that providing an LLM any task it will be able to perform that task better than any human programmer given it, and the same will go for the rest of what humans do.
Programming and machine languages aim for a precise and unambiguous semantics, such that it's meaningful to talk about things like whether the semantics are actually precise or whether the compiler has a bug in failing to implement the spec.
Natural language is not just a higher level of abstraction on our existing stack. If a new model comes out, or you even run an existing model with a new seed, you can get different code out that behaves differently. This is not how compilers work.
search_engine.get_search_results(query, length, order)
It doesn't "care" about the algorithm that produced that list of results, only that it fits the approximation of how the algorithm works as defined by the schema. There are thousands of ways the engine could have been implemented to produce the schema that returns relevance-based results from a web-crawler-sourced database.
In the same way, if I prompt an LLM "design a schema with [list of requirements] that works in [code context and API calls]", there are thousands of ways it could produce that code, but within a margin of error a high quality LLM should be able to produce the code that fits those requirements.
Of course the difference is that there is a stochastic element to LLM generated code. However it is useful to think of LLM's this way because it allows being able to leverage their probability of being correct, even if they aren't as precise as calling APIs but being explicit in how those abstractions are used.
No, LLMs will not get better. The singularity bullshit has been active since 2010s. LLMs have consumed the entire fucking Internet and are still useless. Where the fuck is the rest of the data going to come from? All these emails from people wanting high quality data from PhDs only for them to be scammy. People only want to train these things on easily stolen garbage, not quality input, because quality is expensive. Go figure!
This optimistic horeshit hype is embarrassing.
What makes you so sure of this? They've been getting better like clockwork every few months for the past 5 years.
The hallucinate exactly as much as they did five years ago.
> "benchmarks"
Stop drinking the coolaid and making excuses for LLM limitations, and learn to use the tools properly given their limits instead.
They aren't useless. Otherwise, ChatGPT would have died a long time back
> Where the fuck is the rest of the data going to come from?
Good question. Personally, I think companies will start paying more for high quality data or what is at least perceived as high quality data. I think Reddit and some other social media companies like it are poised to reap the rewards of this.
Whether this will be effective in the long run remains to be seen.
Isn’t the entire industry being fuelled by orders of magnitude more VC funding than revenue?
Because people want to use it, right? And it is a matter of time before they start limiting the ChatGPT "free" or "logged out" accounts, I feel. In the consumer AI chat apps, it is still the dominant brand, at least in my anecdotal experience, and they will basically make the Plus version the one version of the app to definitely use.
Plus they are planning on selling it to enterprises, and at least a couple of them are signing up for sure.
People use them because they are useful, not because they are VC funded.
I also noticed that the time I had to spend on reviews from some of my colleagues increased by 9 times (time tracked). So I don't know how much faster they are being at producing that code, but I think it's taking longer overall to get that ticket closed.
Those come from giving a shit about the work itself, not just shipping it.
They're not orthogonal. Closures and classes are dual forms of the same thing. There are cases where one is better than the other for a given problem.
But just because it's a powerful way to work, doesn't mean you get to be irresponsible with it! (Quite the opposite: Think table-saw)
[1] English SHell (Claude Code in my case), who says I need to be Bourne Again?
Since AI he is typing more prose.
And by definition of this, you should care about the spec.
How code looks like, doesn't matter that much, as long it adheres to the spec.
Can't agree more.
The legal department may have a different idea there.
I've also noticed that the effort to de-slop the shit-code is quite significant, and many times eats the productivity gains of having the LLM generate the code.
I can imagine stuff like this happening when copy pasting from/to ai online chat interfaces, but not in a properly initialized project.
The agent will read all the crappy, partly outdated documentation all over the project and also take the reality of the project into consideration.
It's probably a good idea to also let it rewrite the programmer facing docs. Who else is going to maintain that realistically?
> It works, it’s clear, it’s tested, and it’s maintainable.
It would be super funny if he ended his blogpost there.
Welcome to enterprise, it's not shit because people don't care. People don't care because they are incentivised not to, and those that do care, burn out.
But this is only scratching the surface of what's wrong, as the article elaborates.
The thing is people claim these things are making them faster. I don't believe it. What I believe is they are faster at generating shit. I know that because a baby can coax an LLM into producing shit too.
I do not believe you can spend that much time writing the correct prompt - use this exact function, follow this pattern, add a comment here, don't add one there, no, not like that - and still be quicker than just writing it yourself directly in the language.
It's like if I speak French fluently but only communicate through a translator that I instruct in English but constantly have to correct when they miss the nuance in my speech. I'd just speak French!
So, no, I don't believe it.
What I believe is that many, many software developers have been manually writing boilerplate, repetitive and boring code over and over again up until this point. I believe it because I've seen it. LLMs will obviously speed this up. But some of us already learnt how to use the computer to do that for us.
What I also believe is developers exist who don't understand, or care to understand, what they are doing. They will code using a trial and error approach and find solutions based purely on perceived behaviour of the software. I believe it because I've seen it. Of course LLMs will speed up this process. But some of us actually think about what we're writing, just like we don't just randomly string together words in a restaurant and then just keep trying until we get the dish we want.
Yeah, that's not happening.
LLMs enable masses of non-technical people to create and publish software. They enable scammers and grifters who previously would've used a web site builder to also publish native and mobile apps, all in a fraction of the time and effort previously required. They enable experienced software developers to cut corners and automate parts of the job they never enjoyed to begin with. It's remarkable to me that many people who have been working in this industry for years don't enjoy the process of creating software, and find many tasks to be a "chore".
A small part of this group can't identify quality even if they cared about it. The rest simply doesn't care, and never will. Their priorities are to produce something that works on the surface with the least amount of effort, or, in the case of scammers, to produce whatever can bring them the most revenue as quickly and cheaply as possible. LLMs are a perfect fit for both use cases.
Software developers who care about quality are now even a smaller minority than before. They can also find LLMs to be useful, but not the magical productivity booster that everyone else is so excited about. If anything their work has become more difficult, since they now need to review the mountains of code thrown at them. Producing thousands of lines of code is easy. Ensuring it's high quality is much more difficult.
If I come across a fugly code base, I don't bother reading it, I just ask Claude what it's doing and I ask Claude to fix it. To me this is a huge advantage because my OCD prevented me from producing fugly code by hand but now I wield Claude like an automatic complexity gun.
Producing that kind of complexity when you know there exists a simpler way is demoralizing, but it's not demoralizing when an LLM does it because it's so low effort.
I just hated thinking about all this mind-numbing nonsense.
do people really think functional coding shouldn't involve writing classes?
i can't imagine writing what i think of as code in a "functional programming style" without tons of dataclasses to describe different immutable records that get passed around. and if you're feeling fancy, add some custom constructors to those dataclasses for easy type conversions.