Am I spending too much time on HN or is every post/comment section filled with this same narrative? Basically, LLMs are exciting but they produce messy code for which the dev feels no ownership. Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself. They're okay for one-off scripts or projects you do not intend to maintain.
This is blog post/comment section summary encountered many times per day.
The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
jm4 · 8h ago
I think you described it much more succinctly than most people do. It's been my exact experience as well. The LLM can develop much faster than I can build a mental model. It's very easy to get to a point where you don't know what's going on, a bunch of bugs have been introduced and you can't easily fix them or refactor because you're essentially the new guy on your own project. I find myself adjusting by committing code very frequently and periodically asking the LLM to explain it to me. I often ask the LLM to confirm things are working the way it says they are and it tends to find its own bugs that way.
I use an LLM primarily for smaller, focused data analysis tasks so it's possible to move fast and still stay reasonably on top of things if I'm even a little bit careful. I think it would be really easy to trash a large code base in a hurry without some discipline and skill in using LLM. I'm finding that developing prompts, managing context, controlling pace, staying organized and being able to effectively review the LLM's work are required skills for LLM-assisted coding. Nobody teaches this stuff yet so you have to learn it the hard way.
Now that I have a taste, I wouldn't give it up. There's so much tedious stuff I just don't want to have to do myself that I can offload to the LLM. After more than 20 years doing this, I don't have the same level of patience anymore. There are also situations where I know conceptually what I want to accomplish but may not know exactly how to implement it and I love the LLM for that. I can definitely accomplish more in less time than I ever did before.
mekoka · 7h ago
> because you're essentially the new guy on your own project.
Wow! Lol. One sentence to rule them all.
guluarte · 5h ago
and the problem is the LLM is also the new guy on the next prompt lmao, the guy(LLM) that wrote the original code is long gone
One of my favorite ways to use AI is to get me started on things. I tend to drag my feet when starting something new, but LLMs can whip up something quick. Then I look at what it did and usually hate it. Maybe it structured the code in way that doesn't mesh with the way I think or it completely failed to use some new/esoteric library I rely on.
That hate fuels me to just do the work myself. It's like the same trick as those engagement-bait math problems that pop up on social media with the wrong answer.
skydhash · 4h ago
The same. It’s mostly an example generator, where you know what to do, but can’t take the time to build a model of the language/framework/library. Then you look at the code and retain only the procedure and the symbols used.
antonvs · 5h ago
I do the same thing, except if I hate something, I just ask the LLM to fix it. I can usually get to a starting point I'm pretty happy with, then I take over.
After that, I may ask an LLM to write particular functions, giving it data types and signatures to guide it.
jvergeldedios · 7h ago
> you're essentially the new guy on your own project
Holy shit that's the best description of this phenomenon I've heard so far. The most stark version of this I've experienced is working on a side project with someone who isn't a software engineer who vibe coded a bunch of features without my input. The code looked like 6-8 different people had worked on it with no one driving architecture and I had to untangle how it all got put together.
The sweet spot for me is using it in places where I know the exact pattern I want to use to solve a problem and I can describe it in very small discrete steps. That will often take something that would have taken me an hour or two to hand code something tedious down to 5-10 minutes. I agree that there's no going back, even if all progress stopped now that's too huge of a gain to ignore it as a tool.
meistertigran · 2h ago
I have found that you can't let the LLM do the thinking part. It's really fast at writing, but only writes acceptable code, if the thinking has been done for it.
In some cases, this approach might even be slower than writing the code.
reachableceo · 6h ago
“
I'm finding that developing prompts, managing context, controlling pace, staying organized and being able to effectively review the LLM's work are required skills for LLM-assisted coding
“
Did you not need all these skills / approaches / frameworks for yourself / coding with a team?
This is , I think, the key difference in those (such as myself) who find LLMs to massively increase velocity / quality / quantity of output and those who don’t.
I was already highly effective at being a leader / communicator / delegating / working in teams ranging from small , intimate , we shared a mental model / context up to some of the largest teams on the planet.
If someone wasn’t already a highly effective IC/manager/leader pre LLM, an LLM will simply accelerate how fast they crash into the dirt.
It takes substantial work to be a highly effective contributor / knowledge worker at any level. Put effort into that , and LLMs become absolutely indispensable, especially as a solo founder.
JSR_FDED · 27m ago
Needs/more/slashes.
antonvs · 5h ago
> If someone wasn’t already a highly effective IC/manager/leader pre LLM, an LLM will simply accelerate how fast they crash into the dirt.
Quite the selling point.
ghurtado · 1h ago
> I was already highly..
I stopped reading right here. Presumably other people did too. I don't think you're aware of the caliber of your hubris. About howitzer sized.
I don't mind when other programmers use AI, and use it myself. What I mind is the abdication of responsibility for the code or result. I don't think that we should be issuing a disclaimer when we use AI any more than when I used grep to do the log search. If we use it, we own the result of it as a tool and need to treat it as such. Extra important for generated code.
skeeter2020 · 9h ago
Isn't this what Brooks describes, stating more than 50 years ago that there's a fundamental shift when a system can no longer be held in a single mind, and the communication & coordination load that results from adding people? It seems with a single person offloading the work to an LLM right at the start they give up this efficiency before even beginning, so unless you're getting AI to do all the work it will eventually bite you...
Organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
— Melvin E. Conway, How Do Committees Invent?
> there's a fundamental shift when a system can no longer be held in a single mind
Should LLM users invest in both biological (e.g. memory palace) and silicon memory caches?
JohnMakin · 7h ago
This law took several years to understand for me. Early on, I'd come into an infra situation and kind of incredulously say stuff like "Why not do (incredibly obvious thing)?" and be frustrated by it quite often.
Usually it's not because people think it can't be done, or shouldn't be done, it's because of this law. Like yes in an ideal world we'd do xyz, but department head of product A is a complete anti-productive bozo that no one wants to talk to or deal with, so we'll engineer around him kind of a thing. It's incredibly common once you see it play out you'll see it everywhere.
transpute · 7h ago
> no one wants to talk to or deal with, so we'll engineer around him
Sometimes, forking product lines, departments or budgets.
sunrunner · 7h ago
I'm compelled to make sure any comment mentioning Conway's Law has a reply linking the following video (and perhaps I should create a bot to do this):
This analysis of the real-world effects of Conway's Law seems deeply horrifying, because the implication seems to be that there's nothing you can do to keep communication efficiency and design quality high while also growing an organisation.
exceptione · 6h ago
I think you better link to a good article instead. Good grief, what a horror. A talking head rambling on for 60 minutes.
---
disclaimer: if low information density is your thing, then your mileage may vary. Video's are for documentaries, not for reading out an article in the camera.
sunrunner · 5h ago
Okay, so you didn't even bother to take a few seconds to step through the video to see if there was anything other than the talking head (I'll help you out a bit, there is).
Either way, it's a step-by-step walk through of the ideas of the original article that introduced Conway's Law and a deeper inspection into ideas about _why_ it might be that way.
If that's not enough then my apologies but I haven't yet found an equivalent article that goes through the ideas in the same way but in the kind of information-dense format that I assume would help you hit your daily macros.
Edit: Accidentally a word
brulard · 2h ago
Man, I don't know what kind of world you live in, but an hour long video is a little too much to swallow when reading HN comments. I even gave it a chance, but had to close the tab after the guy just prepared you for something and then steered away to explain "what is a law". That's absurd.
zahlman · 3h ago
After opening the "transcript" on these kinds of videos (from a link in the description, which may need to be expanded), a few lines of JavaScript can extract the actual transcript text without needing to wrestle with browser copy-and-paste. Presumably the entire process could be automated, without even visiting the link in a browser.
photonthug · 5h ago
> A talking head rambling on for 60 minutes.
And probably a few minutes of commercials too. I get the impression this is an emerging generational thing, but unless it's a recorded university course or a very interesting and reputable person.. no thanks. What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload.. yet this noble impulse is clearly self-defeating in this circumstance. Why wait and/or click-through ads for something that's low-density in the first place, you can't search, etc.
Especially now that you can transcript the video and quickly get AI to clean it up into a post, creating/linking a video potentially telegraphs stuff like: nothing much to say but a strong desire to be in the spotlight / narcissism / an acquisitiveness for clicks / engagement. Patiently enduring infinite ads while you're pursuing educational goals and assuming others are willing to, or assuming other people are paying for ad-free just because you do, all telegraphs a lack of respect for the audience, maybe also a lack of self-respect. Nothing against OP or this video in particular. More like a PSA about how this might come across to other people, because I can't be the only person that feels this way.
sunrunner · 4h ago
> a very interesting and reputable person
Always and entirely subjective of course, but I find Casey Muratori to be both interesting and reputable.
> What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload...
What's even weirder is the instinct to not actually engage with the content of the linked video and a discussion on Conway's Law and organisational efficiency and instead head straight into a monologue about some kind of emerging generational phenomenon of laziness highlighted by a supposed preference for long video content, which seems somewhat ironic itself as ignoring the original subject matter to just post your preferences as 'PSA' is its own kind of laziness. To each their own I guess.
Although I do think the six-hour YouTube 'essays' really could do with some serious editing, so perhaps there's something there after all...
transpute · 6h ago
> horrifying
Self-regulating.
sunrunner · 6h ago
Self-regulating in a way that is designed to favour smaller independent groups with a more complete understanding and ownership of whatever <thing> that team does?
zeta0134 · 9h ago
Even putting aside the ethical issues, it's rare that I want to copy/paste code that I find into my own project without doing a thorough review of it. Typically if I'm working off some example I've found, I will hand-type it in my project's established coding style and add comments to clarify things that are not obvious to me in that moment. With an LLM's output, I think I would have to adopt a similar workflow, and right now that feels slower than just solving the problem myself. I already have the project's domain in my mental map, and explaining it to the agent is tedious and a time waste.
I think this is often overlooked, because on the one hand it's really impressive what the predictive model can sometimes do. Maybe it's super handy as an autocomplete, or an exploration, or for rapidly building a prototype? But for real codebases, the code itself isn't the important part. What matters is documenting the business logic and setting it up for efficient maintenance by all stakeholders in the project. That's the actual task, right there. I spend more time writing documentation and unit tests to validate that business logic than I do actually writing the code that will pass those tests, and a lot of that time is specifically spent coordinating with my peers to make sure I understand those requirements, that they were specified correctly, that the customer will be satisfied with the solution... all stuff an LLM isn't really able to replace.
bwfan123 · 9h ago
Thanks for sharing this beautiful essay which I have never come across. The essay and its citations are thought-provoking reading.
IMO, LLMs of today are not capable of building theories (https://news.ycombinator.com/item?id=44427757#44435126). And, if we view programming as theory building, then LLMs are really not capable of coding. They will remain useful tools.
swat535 · 8h ago
LLMS are great at generating scaffolding and boilerplate code which then I can iterate upon. I'm not going write
describe User do ...
it ".."
for the thousand time..
or write the controller files with CRUD actions..
LLMS can do these. I can then review the code, improve it and go from there.
They are also very useful for brain storming ideas, I treat it as a better google search. If I'm stuck trying to model my data, I can ask it questions and it gives me recommendations. I can then think about it and come up with an approach that makes sense.
I also noticed that LLMs really lack basic comprehension. For example, no matter how many times you provide the Schema file for it (or a part of it) , it still doesn't understand that a column doesn't exist on a model and will try to shove it in the suggested code.. very annoying.
All that being said, I have an issue with "vibe coding".. this is where the chaos happens as you blindly copy and paste everything and git push goodbye
lenkite · 5h ago
We need to invent better languages and frameworks. Boilerplate code should be extremely minimal in the first place, but it appears to have exploded in the last decade.
exe34 · 7h ago
If you need to do something for a thousand times, why don't you write a template?
satvikpendem · 6h ago
Templates aren't as flexible as LLMs especially when seeing and utilizing the context of certain files.
pferde · 4h ago
Yes, but at least they're reliable and predictable. Two things you desperately want in software development.
senko · 5h ago
> Am I spending too much time on HN
Likely (as am I).
> LLMs are exciting but they produce messy code for which the dev feels no ownership. [...] The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project
Yup, can confirm, there are indeed people with differing opinions and experience/anecdotes on HN.
> want to tell you how awesome their workflow is without actually showing any code.
You might be having some AI-news-fatigue (I can relate) and missed a few, but there are also people who seem to have gotten it and do want to show code:
Here's one of my non-trivial open source projects where a large portion is AI built: https://github.com/senko/cijene-api (didn't keep stats, I'd eyeball it at conservatively 50% - 80%)
generalizations · 9h ago
It feels like a bell curve:
- one big set of users who don't like it because it generates a lot of code and uses its own style of algorithms, and it's a whole lot of unfamiliar code that the user has to load up in their mind - as you said. Too much to comprehend, and quickly overwhelming.
And then to either side
- it unblocks users who simply couldn't have written the code on their own, who aren't even trying to load it into their head. They are now able to make working programs!
- it accelerates users who could have written it on their own, given enough time, but have figured out how to treat it as an army of junior coders, and learned to only maintain the high level algorithm in their head. They are now able to build far larger projects, fast!
eikenberry · 5h ago
That last bracket is basically the same as the tech based start-up story. You build the projects fast, but you build a ton of tech debt into it that you'll be forced to deal with unless it is a short lived project. Not that this is 100% bad, but something to know going in.
generalizations · 5h ago
Depends. I think that becomes a question of the quality of the programmer - if they were doing it all themselves, the code quality of the (necessarily much smaller) projects would still vary between programmers. Now that variation is magnified, but if you're very good at what you do, I suspect it is still possible to create those projects without the tech debt. Though at the lower end of that bracket, I'd agree you tend to end up with giant balls of mud.
eikenberry · 3h ago
When you play architect and delegate all the work to junior developers it won't matter how good you are, you will incur a lot of tech debt. You simply cannot teach/guide every junior into writing good code as that would take more time than writing it yourself. This fact is baked into the juniors analogy.
generalizations · 2h ago
IMHO it depends on how good you are at being a senior programmer / architect. Put the juniors where they can't do harm, and orchestrate them appropriately. The whole point of employing juniors is that you don't assign a senior to rewrite everything they do.
danielbln · 9h ago
I'm in that last bracket. I don't really have LLMs do tasks that given enough time and scouring docs I couldn't have implemented myself. I set hard rules around architecture, components, general design patterns and then let the LLM go at it, after which review the result in multiple passes, like I would a junior's code. I could not care less about the minutiae of the actual implementation, as long as it conforms to my conventions and style guides and instructions.
generalizations · 8h ago
Yeah. I think the trick is, you have to have been capable of doing it yourself, given time. Same as a senior engineer, they have to be capable of doing the tasks they assign to juniors.
leptons · 8h ago
More often than not the "AI" generates a large block of code that doesn't work, that I still have to read and understand - and it's more difficult to understand because it doesn't work, which is a huge waste of my time. Then I just end up writing the damn code myself, which I should have done in the first place - but my boss wants me to try using the AI.
The only thing the "AI" is marginally good at is as a fancy auto-complete that writes log statements based on the variable I just wrote into the code above it. And even this simple use case it gets it wrong a fair amount.
Overall the "AI" is a net negative for me, but maybe close to break-even thanks to the autocomplete.
generalizations · 5h ago
What "AI" are you using
GenerocUsername · 9h ago
How is that that different than working in a large codebase with 25+ other devs.
My org has 160 engineers working on our e-commerce frontend and middle tiers. I constantly dive into repos and code I have no ownership of. The gitblame shows a contractor who worked here 3 years ago frequently.
Seems LLM does good in small, bad in medium, good again as small modules within big.
vjvjvjvjghv · 5h ago
"How is that that different than working in a large codebase with 25+ other devs.
"
It's as miserable. I hate working on large codebases with multiple contributors unless there is super strong leadership that keeps things aligned.
Barrin92 · 9h ago
>How is that that different than working in a large codebase with 25+ other devs.
Who says it is? The arguably most famous book in the history of software engineering makes that point and precedes LLMs by half a century
gpm · 9h ago
> for which the dev feels no ownership.
This is definitely something I feel is a choice. I've been experimenting quite a bit with AI generated code, and with any code that I intend to publish or maintain I've been very conscious in making the decision that I own the code and that if I'm not entirely happy with the AI generated output I have to fix it (or force the AI to fix it).
Which is a very different way of reviewing code than how you review another humans code, where you make compromises because you're equals.
I think this produces fine code, not particularly quickly but used well probably somewhat quicker (and somewhat higher quality code) than not using AI.
On the flip side on some throwaway experiments and patches to personalize open source products that I have absolutely no intention of upstreaming I've made the decision that the "AI" owns the code, and gone much more down the vibe coding route. This produces unmaintainable sloppy code, but it works, and it takes a lot less work than doing it properly.
I suspect the companies that are trying to force people to use AI are going to get a lot more of the "no human ownership" code than individuals like me experimenting because they think its interesting/fun.
alonsonic · 9h ago
Yes, it's very polarized. That being said, people have shown a lot of code produced by LLMs so I don't understand the dismissive argument you make at the end.
Below is a link to a great article by Simon Willison explaining an LLM assisted workflow and the resulting coded tools.
While I greatly appreciate all of Simon Willson's publishing, these tools don't meet the criteria of the OP's comment in my opinion. Willson's tools archive all do useful, but ultimately small tasks which mostly fit the "They're okay for one-off scripts or projects you do not intend to maintain" caveat from OP.
Meanwhile, it's not uncommon to see people on HN saying they're orchestrating multiple major feature implementations in parallel. The impression we get here is that Simon Willson's entire `tools` featureset could be implemented in a couple of hours.
I'd appreciate some links to the second set of people. Happy to watch YouTube videos or read more in-depth articles.
hedgehog · 8h ago
There's a third category I'd place myself in which is doing day to day work in shipping codebases with some history, using the tools to do a faster and better job of the work I'd do anyway. I think the net result is better code, and ideally on average less of it relative to the functionality because refactors are less expensive.
tptacek · 7h ago
Many big systems are comprised of tools that do a good job at solving small tasks, carefully joined. That LLMs are not especially good at that joinery just means that's a part of the building process that stays manual.
graemep · 9h ago
its really not that different.
"f you assume that this technology will implement your project perfectly without you needing to exercise any of your own skill you’ll quickly be disappointed."
"They’ll absolutely make mistakes—sometimes subtle, sometimes huge. These mistakes can be deeply inhuman—if a human collaborator hallucinated a non-existent library or method you would instantly lose trust in them"
"Once I’ve completed the initial research I change modes dramatically. For production code my LLM usage is much more authoritarian: I treat it like a digital intern, hired to type code for me based on my detailed instructions."
"I got lucky with this example because it helped illustrate my final point: expect to need to take over. LLMs are no replacement for human intuition and experience. "
unshavedyak · 8h ago
I've been experimenting with them quite a bit for the past two weeks. So far the best productivity i've found from them is very tight hand-holding and clear instructions, objectives, etc. Very, very limited thinking. Ideally none.
What that gets me though is less typing fatigue and less decisions made partly due to my wrists/etc. If it's a large (but simple!) refactor, the LLM generally does amazing at that. As good as i would do. But it does that with zero wrist fatigue. Things that i'd normally want to avoid or take my time on it bangs out in minutes.
This coupled with Claude Code's recently Hook[1] introduction and you can help curb a lot of behaviors that are difficult to make perfect from an LLM. Ie making sure it tests, formats, Doesn't include emojis (boy does it like that lol), etc.
And of course a bunch of other practices for good software in general make the LLMs better, as has been discussed on HN plenty of times. Eg testing, docs, etc.
So yea, they're dumb and i don't trust their "thinking" at all. However i think they have huge potential to help us write and maintain large codebases and generally multiplying out productivity.
It's an art for sure though, and restraint is needed to prevent slop. They will put out so. much. slop. Ugh.
The only way I've found LLMs to be useful for building real software, which isn't included in your list of use cases, is for "pseudo boiler-plate". That is there are some patterns that are tedious to write out, but not quite proper boiler-plate in the traditional sense, as so not as amenable to traditional solutions.
One example I deal with frequently is creating Pytorch models. Any real model is absolutely not something you want to leave in the hands of an LLM since the entire point of modeling is to incorporate your own knowledge into the design. But there is a lot of tedium, and room for errors, in getting the initial model wiring setup.
While, big picture, this isn't the 10x (or more) improvement that people like to imagine, I find in practice I personally get really stuck on the "boring parts". Reducing the time I spend on tedious stuff tends to have a pretty notable improvement in my overall flow.
noodletheworld · 10h ago
> want to tell you how awesome their workflow is
Or, often, sell you something.
dkubb · 9h ago
A lot of the time they are selling themselves as influencers on the subject. It’s often a way to get views or attention that they can use in the future.
zahlman · 4h ago
> is every post/comment section [related to AI/LLMs] filled with this same narrative? ... The other side of it is ...
I don't see why any of this should be surprising. I think it just reflects a lot of developers using this technology and having experiences that fall neatly into one of these two camps. I can imagine a lot of factors that might pull an individual developer in one direction or the other; most of them probably correlate, and people in the middle might not feel like they have anything interesting to say.
tempodox · 9h ago
> … encountered many times per day.
I suspect that's at least partially because all of that doesn't stop the hype from being pushed on and on without mercy. Which in turn is probably because the perverse amounts of investment that went into this have to be reclaimed somehow with monetization. Imagine all those VCs having to realize that hundreds of billions of $$$ are lost to wishful hallucinations. Before they concede that there will of course be much astroturfing in the vein of your last paragraph.
zemo · 8h ago
I think this narrative gets recycled because it's the shallow depth of reasoning afforded by thinking about technology only by thinking about the instruments of that technology and one's own personal experience with them, which is the perspective that is prioritized on HN.
jimbokun · 9h ago
Of course it's the main topic because it's an existential question for almost all our careers.
xpe · 9h ago
And AI in general* poses existence-level questions (that could go either way: good or bad) regarding military applications, medical research, economic benefits, quality of life, human thriving, etc.
The idea that the future is going to “more or less be predictable” and “within the realm of normal” is a pretty bold claim when you look at history! Paradigm shifts happen. And many people think we’re in the middle of one — people that don’t necessarily have an economic interest in saying so.
* I’m not taking a position here about predicting what particular AI technologies will come next, for what price, with what efficiency and capabilities, and when. Lots of things could happen we can’t predict — like economic cycles, overinvestment, energy constraints, war, popular pushback, policy choices, etc. But I would probably bet that LLMs are just the beginning.
seadan83 · 4h ago
I believe it's the main topic because VCs have been trying to solve the problem of "expensive software developers" for a long time. The AI start-up hype train is real simply because that is how you get VC money these days. VC money contracted with the economy and post-Covid severely, and seemingly what is available is going to AI something-or-other. Somehow, the VC-orientated startup hype-train seems to have become the dominant voice in the zeitgeist of software development.
vjvjvjvjghv · 5h ago
Just one thought: I wonder if storing the prompt history together with the LLM code would make it easier to understand the thought process. I have noticed that I find it a little more difficult to read LLM code vs human code (that's written by decent devs)
giancarlostoro · 8h ago
I've been saying this for a while. The issue is if you don't intimately know your code, you can't truly maintain it. What happens when the LLM can't figure out some obscure but that's costing you $$$,$$$ per minute? You think being unable to have the AI figure it out is an acceptable answer? Of course not. LLMs are good for figuring out bugs and paths forward, but don't bet your entire infrastructure on it. Use it as an assistant not a hammer.
jazzyjackson · 4h ago
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
- Brian Kernighan
So if there's a bug in code that an LLM wrote, simply wait 6 months until the LLMs are twice as smart?
zahlman · 3h ago
> > What happens when the LLM can't figure out some obscure but that's costing you $$$,$$$ per minute?
> simply wait [about a quarter of a million minutes] until the LLMs are twice as smart?
...
therealpygon · 3h ago
Laziness doesn’t stop just because technology improves. This is just intellectual laziness being blamed on LLMs, as usual. “I don’t bother to read my code anymore, so LLMs did it.” “I don’t practice coding anymore, it’s LLMs fault.” Blah. Blah Blah.
People will always blame someone or something else for laziness.
jordanb · 5h ago
Weirdly this is pretty much the story of code generators.
When I first got into web development there were people using HTML generators based on photoshop comps. They would produce atrocious HTML. FE developers started rewriting the HTML because otherwise you'd end up with brittle layout that was difficult to extend.
By the time the "responsive web" became a thing HTML generators were dead and designers were expected to give developers wireframes + web-ready assets.
Same thing pretty much happened with UML->Code generators, with different details.
There's always been a tradeoff between the convenience and deskilling involved in generated code and the long term maintainability.
There's also the fact that coding is fundamentally an activity where you try to use abstractions to manage complexity. Ideally, you have interfaces that are good enough that the code reads like a natural language because you're expressing what you what the computer to do at the exact correct layer of abstraction. Code generators tend to both cause and encourage bad interfaces. Often the impetus to use a code generator is the existing interfaces are bad, bureaucratic, or obscure. But using code generators ends up creating more of the same.
jauntywundrkind · 8h ago
There's a lot of truth here.
But also. But also, LLMs are incredibly powerful and capable tools at discovering and finding what the architecture of things is. They have amazing abilities to analyze huge code bases & to build documents and diagrams to map out the system. They can answer all manners of questions, to let us probe in.
Now, whether LLMs generate well architects systems is largely operator dependent. There's lots of low effort zero shot ways to give LLMs very little guidance and get out who knows what. But when I reflect on the fact that, for now, most code is legacy code, and most code is hideously under documented, most people reading code don't really have access to experts or artifacts to explain the code and it's architecture, my hope and belief is that LLMs are incredible tools to radically increase maintainability versus where we are now, that they are powerful peers in building the mental model of programming & systems.
nico · 9h ago
> Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself
This happens with any sufficiently big/old codebase. We can never remember everything, even if we wrote it ourselves
I do agree with the sentiment and insight about the 2 branches of topics frequently seen lately on HN about AI-assisted coding
Would really like to see a live/video demo of semi-autonomous agents running in parallel and executing actual useful tasks on a decently complex codebase, ideally one that was entirely “manually” written by devs before agents are involved - and that actually runs a production system with either lots of users or paid customers
photonthug · 4h ago
> This happens with any sufficiently big/old codebase. We can never remember everything, even if we wrote it ourselves
The important thing about a codebase wasn't ever really size or age, but whether it was a planned architecture or grown organically. The same is true post-LLM. Want to put AI in charge of tool-smithing inconsequential little widgets that are blocking you? Fine. Want to put AI in charge of deciding your overall approach and structure? Maybe fine. Worst of all is to put the AI in charge of the former, only to find later that you handed over architectural decisions at some point and without really intending to.
sunrunner · 4h ago
> planned architecture or grown organically
That sounds like a hard or, as if ten years of development of a large codebase was entirely known up-front with not a single change to the structure over time that happened as a result of some new information.
"We build our computers the way we build our cities—over time, without a plan, on top of ruins." -- Ellen Ullman
WD-42 · 8h ago
The difference is at the moment of writing, you understand it. A LLM outputs a giant glob and you immediately don’t understand it.
sppfly · 4h ago
> Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself.
Wow you really nail the point, that's what I felt but I did not understand. Thanks for the comment.
guluarte · 5h ago
LLMs are fine with small things, such as creating or refactoring a small function, adding logs, writing a test, etc. But developing entire features or whole applications is stupid, being statistical models, the more input you feed, the more errors they accumulate
baq · 9h ago
> workflow is without actually showing any code.
an argument can be made that the code doesn't matter as long as the product works as it's supposed to (big asterisk here)
Disposal8433 · 9h ago
> the code doesn't matter
The only goal of a code generator is the code. I don't care whether it works or not (for specific scenarios and it could break 90% of the time). I want to see the generated code and, so far, I have never seen anything interesting besides todo lists made with ReactJS.
baq · 9h ago
People who do this don’t want to see the code and perhaps even don’t care about the code. Code is just a means to an end, which is the product. It might be the wrong take from a software engineer’s perspective, but it is a take that works in at least some cases.
fridder · 10h ago
Trying to wade through the hype or doom is bit of a challenge
ai-christianson · 9h ago
> The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
This is a great read on the situation. Do you think these people are just making it up/generating baseless hype?
WD-42 · 9h ago
I think people are rightly hesitant to share code that has their name on it but for which they know nothing about.
I have seen a few of these full blown llm coded projects and every one of them has has some giant red flashing warning at the top of the README about the project being llm generated.
So I think it’s probably a mix of avoiding embarrassment and self preservation.
ai-christianson · 7h ago
Interesting, to me it's still very much a human in the loop process and the person whose name is on the commit is ultimately responsible for what they commit.
xnx · 6h ago
Don't discount the number of articles that espouse a provocative position not held by the poster for the purpose of gaining traffic/attention/clout/influencer-points.
DiggyJohnson · 5h ago
I think it’s just we are at a point where many are coming to this realization around the same time. You did an excellent job summarizing it though.
pier25 · 9h ago
> Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself.
I don't think that's the main reason. Well written code is easier to follow even when you haven't written it yourself (or maybe you did but forgot about it).
skeeter2020 · 9h ago
I argue you still need to cognitively load the solution, it's just that well written code allows you to (a) segment the code base effectively and (b) hold it at a higher level of abstraction.
pier25 · 9h ago
Absolutely. The comment I was responding to argued the difficulty was that LLM code wasn't loaded cognitively. I'm arguing the problem is actually the code produced by LLMs tends to be messy and hard to follow beyond trivial examples.
teaearlgraycold · 4h ago
I am bewildered by the posts where people claim to have 20+ agents running at the same time on their repo. I’ve used o3, Claude 4 Opus, Gemini 2.5 Pro, and they can’t run for more than 15 minutes in the best of cases without fucking up or getting caught in a loop. Tasks that are slightly more complicated than your average code are beyond their comprehension.
lowbloodsugar · 3h ago
I work in a team of 8. So previously I wrote 1/8th of the code. Now I write 1/16th with the other 1/16th being written by an LLM. Figuring out how to work effectively together as a team requires solving the same problem you describe. For me, an LLM is like having another junior developer on the team.
How you get good results as a team is to develop a shared mental model, and that typically needs to exist in design docs. I find that without design docs, we all agree verbally, and then are shocked at what everyone else thought we'd agreed on. Write it down.
LLMs, like junior devs, do much better with design docs. You can even let the junior dev try writing some design docs.
So if you're a solo developer, I can see this would be a big change for you. Anyone working on a team has already had to solve this problem.
On the subject of ownership: if I commit it, I own it. If the internet "goes down" and the commit has got my name on it, "but AI" isn't going to cut it.
furyofantares · 9h ago
We're going through something.
apwell23 · 6h ago
> The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
There have been grifters hopping onto every trend. Have you noticed they never show you what exactly they built or if it was ever useful.
ookblah · 9h ago
honestly my theory is part of it is people who are very caught up in the "craft" part of it and now hate these LLMs for producing shit that pretty much works but isn't like this "perfect specimen" of coding architecture that they now have to pour over.
honestly, the vast majority of basically CRUD apps out there we are inflating our skills a bit too much here. even if the code is junk you can adapt your mindset to accept what LLMs produce, clean it up a bit, and come out with something maintainable.
like do these people ever have to review code from other people or juniors? the feedback loop here is tighter (although the drawback is your LLM doesn't "learn").
i wouldn't use it for anything super novel or cutting edge i guess, but i don't know, i guess everyone on HN might be coding some super secret advanced project that an LLM can't handle....?
PleasureBot · 9h ago
The fundamental limitation of LLMs writing code is that reading and understanding code is harder and slower than writing it. With other engineers that I work with there is an established level of trust where I do not need to deep dive into every PR. With LLMs it is like I am constantly doing code reviews for someone with whom I have zero trust. This is fundamentally a slow process, especially if you need to maintain this code in the long term and it is part of your 'core business code' that you work on 90% of the time. It also comes with all the downsides of no longer being an expert in your own codebase.
Ultimately I am responsible for any code I check in even if it was written by an LLM, so I need to perform these lengthy reviews. As others have said, if it is code that doesn't need to be maintained, then reviewing the code can be a much faster process. This is why it is so popular for hobby projects since you don't need to maintain the code if you don't want to, and it doesn't matter if you introduce subtle but catastrophic bugs.
Ultimately the tech feels like a net neutral. When you want to just throw the code away after it is very fast and good enough. If you are responsible for maintaining it, its slower than writing it yourself.
ookblah · 8h ago
which is weird to me because i'm using in prod? literally if i care about style and structure i just say, look at these other few files and figure it out and it's fine.
if i need to work on something mission critical or new i do it by hand first. tests catch everything else. or you can just run it so that you review every change (like in claude code) as it comes in and can still grok the entire thing vs having to review multiple large files at the end.
thus i literally wonder what people are working on that requires this 100% focused mission critical style stuff at all times. i mean i don't think it's magic or AGI, but the general argument is always 1) works for hobby projects but not "production" 2) the LLM produces "messy code" which you have to review line by line as if you wrote it yourself which i've found to not be true at all.
mythrwy · 7h ago
Volume of PRs LLMs enable add to the burden.
mentos · 8h ago
I’m currently using Cursor to complete something that would have taken me about a full year in a month.
But I’m in no rush to invite an army of people to compete with me just yet. I’ll be back when I’m sipping coladas on a beach to tell you what I did.
mythrwy · 7h ago
We all will be sipping coladas on the beach at that point. You can just walk over to the next cabana and tell us.
andrewmutz · 9h ago
I went from one camp to the other in the last month. I've been blown away by whats possible and here's what's working for me:
- Use Cline with Sonnet 4. Other models can work but this is the best balance of price and effectiveness.
- Always use "plan" mode first, and only after the plan mode looks good do you switch to "act" mode.
- Treat the LLM as though you are pair-programming with a junior engineer.
- Review every line that gets written as it gets written. Object or change it if you don't like it for any reason.
- Do test-driven development, and have the LLM always write tests first.
I have transitioned to using this full-time for coding and am loving the results. The code is better than what I used to write, because sometimes I can miss certain cases or get lazy. The code is better tested. The code gets written at least twice as fast. This is real production code that is being code reviewed by other humans.
noisy_boy · 9h ago
I think they are making me more productive in achieving my targets and worse in my ability to program.
They are exactly like steroids - bigger muscles fast but tons of side effects and everything collapses the moment you stop. Companies don't care because they are more concerned about getting to their targets fast instead of your health.
Another harmful drug for our brain if consumed without moderation. I won't entirely stop using them but I have already started to actively control/focus my usage.
criddell · 9h ago
Your steroids comparison made me think of Cal Newport's recent blog post[1] where he argues that AI is making us lazy. He quotes some researchers who hooked people up to EEG machines then had them work. The people working without AI assistance incurred more brain "strain" and that's probably a good thing.
But even he doesn't think AI shouldn't be used. Go ahead and use it for stuff like email but don't use it for your core work.
I haven't read the entire paper, but just looking at the abstract and conclusion it ironically seems...lazy? Like we're going to have some people use ChatGPT 4 times in 4 months and draw conclusions on long-term brain activity based on that? How do you even control for noise in such a study?
I'm generally sympathetic to the idea that LLMs can create atrophy in our ability to code or whatever, but I dislike that this clickbaity study gets shared so much.
ukFxqnLa2sBSBf6 · 8h ago
Of all the things I think AI shouldn’t be used for AI is one of them (unless you’re like completely illiterate). Whenever I get a “project status update” and the section header emojis show up I instantly just want to throw that garage in the trash.
Tijdreiziger · 6h ago
That usage of emoji predates LLMs.
smeeth · 8h ago
It's worth noting this is the exact argument people used against adopting electric calculators.
thoughtpeddler · 7h ago
Calculators are a very narrow form of intelligence as compared to the general-purpose intelligence that LLMs are. The muscle/steroid analogy from this same discussion thread is apt here. Calculators enhanced and replaced just one 'muscle', so the argument against them would be like "ya but do we really need this one muscle anymore?", whereas with LLMs the argument is "do we really even need a body at all anymore?" (if extrapolated out several more years into the future).
smeeth · 4h ago
You don't need the analogy. If you have a tool that does a job for you your capacity to do the job degrades alongside other associated skills.
Tools that do many things and tools that do a small number of things are still tools.
> "do we really even need a body at all anymore?"
It's a legitimate question. What's so special about the body and why do we need to have one? Would life be better or worse without bodies?
Deep down I think everyone's answer has more to do with spirituality than anything else. There isn't a single objectively correct response.
jimbokun · 9h ago
I think LLMs have made a lot of developers forget the lessons in "Simple Made Easy":
LLMs seem to be really good at reproducing the classic Ball of Mud, that can't really be refactored or understood.
There's a lot of power in creating simple components that interact with other simple components to produce complex functionality. While each component is easy to understand and debug and predict its performance. The trick is to figure out how to decompose your complex problem into these simple components and their interactions.
I suppose once LLMs get really good at that skill, will be when we really won't need developers any more.
kraftman · 7h ago
you can tell them what to do though right? I think a lot of variance between people that say LLMs are useless and people that say they are useful is people that learn what they are good and bad at and can predict the quality of the output based on their input.
Like, I might ask an LLM on its opinion on the best way to break something down, to see if it 'thinks' of anything i havent, and then ask it to implement that. I wouldn't ask it to do the whole thing from scratch with no input on how to structure things.
apwell23 · 6h ago
> you can tell them what to do though right?
kind of.
osmsucks · 5h ago
> LLMs seem to be really good at reproducing the classic Ball of Mud, that can't really be refactored or understood.
This, but but only for code. I've seen "leaders" at work suggest that we "embrace" AI, even for handling production systems and managing their complexity. That's like saying: "We've built this obscure, inscrutable system, therefore we need another obscure, inscrutable system on top of it in order to understand it!". To me, this sounds deranged, but the amount of gaslighting that's going on also makes you think you're the only to believe that...
globular-toast · 9h ago
> I suppose once LLMs get really good at that skill, will be when we really won't need developers any more.
I don't really get this argument. So when LLMs become "perfect" software developers are we just going to have them running 24/7 shitting out every conceivable piece of software ever? What would anyone do with that?
Or do you expect every doctor, electrician, sales assistant, hairdresser, train driver etc. to start developing their own software on top of their existing job?
What's more likely is a few people will make it their jobs to find and break down problems people have that could use a piece of software and develop said piece of software using whatever means they have available to them. Today we call these people software developers.
breckenedge · 7h ago
> Or do you expect every doctor, electrician, sales assistant, hairdresser, train driver etc. to start developing their own software on top of their existing job?
I started my software career by automating my job, then automating other people’s jobs. Eventually someone decided it would be easier to just hire me as a software engineer.
I just met with an architect for adding a deck onto my house (need plans for code compliance). He said he was using AI to write programs that he could use with design software. He demoed how he was using AI to convert his static renders into walkthrough movies.
seanw444 · 10h ago
This is pretty much the conclusion I've come to as well. It's not good at being an autocomplete for entire chunks of your codebase. You lose the mental model of what is doing what, and exactly where. I prefer to use it as a personalized, faster-iterating StackOverflow. I'll ask it to give me a rundown of a concept I'm not familiar with, or for a general direction to point me in if I'm uncertain of what a good solution would be. Then I'll make the decision, and implement it myself. That workflow has worked out much better for me so far.
solomonb · 9h ago
I use it the same way but cursor is constantly insisting on making code changes. Is there a trick to get it to introspect on the codebase without wanting to modify it?
furyofantares · 9h ago
I say something like "without making any code changes right now, investigate blah blah blah" and if I want more than just info "and propose a direction that I can look at", or sometimes give it a file to write a proposal into.
haiku2077 · 9h ago
In Zed you can toggle between read-only and write modes at any point when using the agent. You can also create custom modes that allow the use of specific tools, editing only specific files, etc. Does cursor have a similar feature?
nsingh2 · 9h ago
In Cursor, there is an 'ask' mode that isn't as over-eager to make edits as the default agent mode.
rane · 3h ago
I don't have this experience. The mental mode might not be quite as strong, but if you always review the code given carefully, you will have a good pretty idea what is where and how things interact.
stephendause · 7h ago
One point I haven't seen made elsewhere yet is that LLMs can occasionally make you less productive. If they hallucinate a promising-seeming answer and send you down a path that you wouldn't have gone down otherwise, they can really waste your time. I think on net, they are helpful, especially if you check their sources (which might not always back up what they are saying!). But it's good to keep in mind that sometimes doing it yourself is actually faster.
href · 6h ago
I've dialed down a lot as well. The answers I got for my queries were too often plain wrong.
I instead started asking where I might look something up - in what man page, or in which documentation. Then I go read that.
This helps me build a better mental map about where information is found (e.g., in what man page), decreasing both my reliance on search engines, and LLMs in the long run.
LLMs have their uses, but they are just a tool, and an imprecise one at that.
quaintdev · 10h ago
LLMs have limits. They are super powerful but they can't make the kind of leap humans can. For example, I asked both Claude and Gemini below problem.
"I want to run webserver on Android but it does not allow binding on ports lower than 1000. What are my options?"
Both responded with below solutions
1. Use reverse proxy
2. Root the phone
3. Run on higher port
Even after asking them to rethink they couldn't come up with the solution I was expecting. The solution to this problem is HTTPS RR records[1]. Both models knew about HTTPS RR but couldn't suggest it as a solution. It's only after I included it in their context both agreed it as a possible solution.
I'm not sure I would measure LLMs on recommending a fairly obscure and rather new spec that isn't even fully supported by e.g. Chrome. That's a leap that I, a human, wouldn't have made either.
mandevil · 9h ago
On the other hand, I find that "someone who has read all the blogs and papers and so can suggest something you might have missed" is my favorite use case for LLM, so seeing that it can miss a useful (and novel to me) idea is annoying.
Hackbraten · 8h ago
Which is easier for you to remember: facts you’ve seen a hundred times in your lifetime, or facts you’ve seen once or twice?
For an LLM, I’d expect it to be similar. It can recall the stuff it’s seen thousands of times, but has a hard time recalling the niche/underdocumented stuff that it’s seen just a dozen times.
gtsop · 5h ago
> Which is easier for you to remember: facts you’ve seen a hundred times in your lifetime, or facts you’ve seen once or twice?
The human brain isn't a statistical aggregator. If you see a psychologically socking thing once in your lifetime, you might remember it even after dementia hits when you're old.
On the other hand, you pass by hundrends of shops every day and receive the data signal of their signs over and over and over, yet you remember nothing.
You remeber stuff you pay attention to (for whatever reason)
gtsop · 5h ago
So is the expectation for it to suggest obvious solutions that the majority of people already know?
I am fine with this, but let's be clear about what we're expecting
pdabbadabba · 4h ago
> So is the expectation for it to suggest obvious solutions that the majority of people already know?
Certainly a majority of people don't know this. What we're really asking is whether an LLM is expected to more than (or as much as) the average domain expert.
jeremyjh · 5h ago
All it can do is predict the next token. So, yes.
ramses0 · 9h ago
I'm adding this tidbit of knowledge to my context as well... :-P
Only recently have I started interacting with LLM's more (I tried out a previous "use it as a book club partner" suggestion, and it's pretty great!).
When coding with them (via cursor), there was an interaction where I nudged it: "hey, you forgot xyz when you wrote that code the first time" (ie: updating an associated data structure or cache or whatever), and I find myself INTENTIONALLY giving the machine at least the shadow of a benefit of the doubt that: "Yeah, I might have made that mistake too if I were writing that code" or "Yeah, I might have written the base case first and _then_ gotten around to updating the cache, or decrementing the overall number of found items or whatever".
In the "book club" and "movie club" case, I asked it to discuss two movies and there were a few flubs: was the main character "justly imprisoned", or "unjustly imprisoned" ... a human might have made that same typo? Correct it, don't dwell on it, go with the flow... even in a 100% human discussion on books and movies, people (and hallucinating AI/LLM's) can not remember with 100% pinpoint accuracy every little detail, and I find giving a bit of benefit of the doubt to the conversation partner lowers my stress level quite a bit.
I guess: even when it's an AI, try to keep your interactions positive.
_flux · 10h ago
TIL. I knew about the SRV reconds—which almost nobody uses I think?—but this was news to me.
I guess it's also actually supported, unlike SRV that are more like supported only by some applications? Matrix migrated from SRV to .well-known files for providing the data. (Or I maybe it supports both.)
Arathorn · 8h ago
Matrix supports both; when you're trying to get Matrix deployed on someone's domain it's a crapshoot on whether they have permission to write to .well-known on the webroot (and if they do, the chances of it getting vaped by a CMS update are high)... or whether they have permission to set DNS records.
bravetraveler · 9h ago
You'd be surprised at how many games use SRV records. Children struggle with names; let alone ports and modifier keys.
At least... this was before multiplayer discovery was commandeered. Matchmaking and so on largely put an end to opportunities.
remram · 9h ago
See also SVCB
GoToRO · 10h ago
Classic problem of "they give you the solution once you ask about it".
ctippett · 9h ago
Huh, that's a neat trick. Your comment is the first I'm learning of HTTPS RR records... so I won't pass judgement on whether an AI should've known enough to suggest it.
To be fair, the question implies the port number of the local service is the problem, when it's more about making sure users can access it without needing to specify a port number in the URL.
Yes, an experienced person might be able to suss out what the real problem was, but it's not really the LLMs fault for answering the specific question it was asked. Maybe you just wanted to run a server for testing and didn't realize that you can add a non-standard port to the URL.
lcnPylGDnU4H9OF · 9h ago
It's not really an LLM being "faulty" in that we kinda know they have these limitations. I think they're pointing out that these models have a hard time "thinking outside the box" which is generally a lauded skill, especially for the problem-solving/planning that agents are expected to do.
causal · 8h ago
Yeah LLMs are generally going to pursue a solution to the question, not figure out the need behind the ask.
Not to mention the solution did end up being to use a higher port number...
capt_obvious_77 · 10h ago
Off-topic, but reading your article about hosting a website on your phone inspired me a lot. Is that possible on a non-jail-broken phone? And what webserver would you suggest?
quaintdev · 10h ago
Yes, no root required. I asked Claude to write Flutter app that would serve a static file from assets. There are plenty of webserver available on play store too.
bird0861 · 8h ago
Just use termux.
jjice · 9h ago
My favorite use case for LLMs in long term software production (they're pretty great at one off stuff since I don't need to maintain) is as an advanced boiler plate generator.
Stuff that can't just be abstracted to a function or class but also require no real thought. Tests are often (depending on what they're testing) in this realm.
I was resistant at first, but I love it. It's reduced the parts of my job that I dislike doing because of how monotonous they are and replaced them with a new fun thing to do - optimizing prompts that get it done for me much faster.
Writing the prompt and reviewing the code is _so_ much faster on tedious simple stuff and it leaves the interesting, though provoking parts of my work for me to do.
Const-me · 9h ago
I’m using ChatGPT (enterprise version paid by my employer) quite a lot lately, and I find it a useful tool. Here’s what I learned over time.
Don’t feed many pages of code to AI, it works best for isolated functions or small classes with little dependencies.
In 10% of cases when I ask to generate or complete code, the quality of the code is less than ideal but fixable with extra instructions. In 25% of cases, the quality of generated code is bad and remains so even after telling it what’s wrong and how to fix. When it happens, I simply ignore the AI output and do something else reasonable.
Apart from writing code, I find it useful at reviewing new code I wrote. Half of the comments are crap and should be ignored. Some others are questionable. However, I remember a few times when the AI identified actual bugs or other important issues in my code, and proposed fixes. Again, don’t copy-paste many pages at once, do it piecewise.
For some niche areas (examples are HLSL shaders, or C++ with SIMD intrinsics) the AI is pretty much useless, probably was not enough training data available.
Overall, I believe ChatGPT improved my code quality. Not only as a result of reviews, comments, or generated codes, but also my piecewise copy-pasting workflow improved overall architecture by splitting the codebase into classes/functions/modules/interfaces each doing their own thing.
wombat-man · 9h ago
Yeah the code review potential is big. I just started using AI for this and it's pretty handy.
I agree it's good for helping writing smaller bits like functions. I also use it to help me write unit tests which can be kind of tedious otherwise.
I do think that the quality of AI assistance has improved a lot in the past year. So if you tried it before, maybe take another crack at it.
andy99 · 9h ago
I'm realizing that LLMs, for coding in particular but also for many other tasks, are a new version of the fad dieting phenomenon.
People really want a quick, low effort fix that appeals to the energy conserving lizard brain while still promising all the results.
In reality there aren't shortcuts, there's just tradeoffs, and we all realize it eventually.
hedgehog · 9h ago
Over the last couple months I've gone from highly skeptical to a regular user (Copilot in my case). Two big things changed: First, I figured out that only some models are good enough to do the tasks I want (Claude Sonnet 3.7 and 4 out of everything I've tested). Second, it takes some infrastructure. I've added around 1000 words of additional instructions telling Copilot how to operate, and that's on top of tests (which you should have anyway) and 3rd party documentation. I haven't tried the fleet-of-agents thing, one VS Code instance is enough and I want to understand the changes in detail.
Edit: In concrete terms the workflow is to allow Copilot to make changes, see what's broken, fix those, review the diff against the goal, simplify the changes, etc, and repeat, until the overall task is done. All hands off.
dwoldrich · 5h ago
I personally believe it's a mistake to invite AI into your editor/IDE. Keep it separate to the browser, keep discrete, concise question and answer threads. Copy and paste whenever it delivers some gold (that comes with all the copy-pasta dangers, I know - oh, don't I know it!)
It's important to always maintain the developer role, don't ever surrender it.
sarmadgulzar · 9h ago
Can relate. I've also shifted towards generating small snippets of code using LLMs, giving them a glance, and asking to write unit tests for them. And then I review the unit tests carefully. But integrating the snippets together into the bigger system, I always do that myself. LLMs can do it sometimes but when it becomes big enough that it can't fit into the context window, then it's a real issue because now LLMs doesn't know what's going on and neither do you. So, I'll advise you to use LLMs to generate tedious bits of code but you must have the overall architecture committed into your memory as well so that when AI messes up, at least you have some clue about how to fix it.
causal · 9h ago
What's it called when you choose a task because it's easy, even if it's not what you need to do at all? I think that's what LLMs have activated in a lot of us: writing code used to be kinda hard, but now it's super easy, so let's just write more code.
The hard parts of engineering have always been decision making, socializing, and validating ideas against cold hard reality. But writing code just got easier so let's do that instead.
Prior to LLMs writing 10 lines of code might have been a really productive day, especially if we were able to thoughtfully avoid writing 1,000 unnecessary lines. LLMs do not change this.
Velorivox · 7h ago
I'm not sure if there's a name for that specifically, but it seems strongly related to the streetlight effect. [0]
I don’t have it write of my Python firmware or Elixir backend stuff.
What I do let it rough in is web front end stuff. I view the need for and utility of LLMs in the html/css/tailwind/js space as an indictment of complexity and inconsistency. It’s amazing that the web front end stuff has just evolved over the years, organically morphing from one thing to another, but a sound well engineered simple-is-best set of software it is not. And in a world where my efforts will probably work in most browser contexts, no surprise that I’m willing to mix in a tool that will make results that will probably work. A mess is still a mess.
kamens · 8h ago
Personally, I follow the simple rule: "I type every single character myself. The AI/agent/etc offers inspiration." It's an effective balance between embracing what the tech can do (I'm dialing up my usage) and maintaining my personal connection to the code (I'm having fun + keeping things in my head).
I wonder if this is as good as LLMs can get, or if this is a transition period between LLM as an assistant, and LLM as a compiler. Where in the latter world we don’t need to care about the code because we just care about the features. We let the LLM deal with the code and we deal with the context, treating code more like a binary. In that world, I’d bet code gets the same treatment as memory management today, where only a small percent of people need to manage it directly and most of us assume it happens correctly enough to not worry about it.
rzz3 · 9h ago
Why wonder if this is “as good as LLMs can get” when we saw such a huge improvement between Claude 3.7 and Claude 4, released what, a couple weeks ago? Of course it isn’t as good as LLMs can get. Give it 3 more weeks and you’ll see it get better again.
kadhirvelm · 8h ago
I don’t doubt LLMs will become better assistants over time, as you said every few weeks. I more mean if LLMs will cross the assistant to compiler chasm where we don’t have to think about the code anymore and can focus on just the features
MrGilbert · 10h ago
My point of view: LLMs should be taken as a tool, not as a source of wisdom. I know someone who likes to answer people-related questions through a LLM. (E.g.: "What should this person do?" "What should we know about you?" etc.) More than once, this leads to him getting into a state of limbo when he tries to explain what he means with what he wrote. It feels a bit wild - a bit like back in school, when the guy who copied your homework, is forced to explain how he ended up with the solution.
obirunda · 7h ago
The dichotomy between the people who are "orchestrating" agents to build software and the people experiencing this less than ideal outcomes from LLMs is fascinating.
I don't think LLM for coding productivity is all hype but I think for the people who "see the magic" there are many illusions here similar to those who fall prey to an MLM pitch.
You can see all the claims aren't necessarily unfounded, but the lack of guaranteed reproducibility leaves the door open for many caveats in favor of belief for the believer and cynicism for everybody else.
For the believers if it's not working for one person, it's a skill issue related to providing the best prompt, the right rules, the perfect context and so forth. At what point is this a roundabout way of doing it yourself anyway?
nvahalik · 4h ago
Sounds like he shot for the moon and missed.
I've been allowing LLMs to do more "background" work for me. Giving me some room to experiment with stuff so that I can come back in 10-15 minutes and see what it's done.
The key things I've come to are that it HAS to be fairly limited. Giving it a big task like refactoring a code base won't work. Giving it an example can help dramatically. If you haven't "trained" it by giving it context or adding your CLAUDE.md file, you'll end up finding it doing things you don't want it to do.
Another great task I've been giving it while I'm working on other things is generating docs for existing features and modules. It is surprisingly good at looking at events and following those events to see where they go and generating diagrams and he like.
ramon156 · 10h ago
I like Zed's way of doing stuff (Ask mode). Just ask it a question and let it go through the whole thing. I still haven't figured out how to form the question so it doesn't just rail off and start implementing code. I don't care about code, I ask it to either validate my mental model or improve it
xattt · 9h ago
This is it. This is a new paradigm and a lot of people seem to think that it’s authoritative. It’s decision support tool, and the output still has to pass an internal litmus test.
Whether someone’s litmus test is well-developed is another matter.
Nedomas · 10h ago
two weeks ago I started heavily using Codex (I have 20y+ dev xp).
At first I was very enthusiastic and thought Codex is helping me multiplex myself. But you actually spend so much time trying to explain Codex the most obvious things and it gets them wrong all the time in some kind of nuanced way that in the end you spend more time doing things via Codex than by hand.
So I also dialed back Codex usage and got back to doing many more things by hand again because its just so much faster and much more predictable time-wise.
nsingh2 · 9h ago
Same experience, these "background agents" are powered by models that aren't yet capable enough to handle large, tangled or legacy codebases without human guidance. So the background part ends up being functionally useless in my experience.
atonse · 10h ago
These seem like good checkpoints (and valid criticisms) on the road to progress.
But it's also not crazy to think that with LLMs getting smarter (and considerable resources put into making them better at coding), that future versions would clean up and refactor code written by past versions. Correct?
yard2010 · 9h ago
LLM doesn't get smarter. An LLM is just a statistical tool for text generation, not a form of an AI. Since language is so inherent to intelligence and knowledge, it correlates. But LLM is just a tool for predicting what the average internet person would say.
bwfan123 · 9h ago
nope, there are limits to what next-token predictions can do, we we have hit those limits. cursor and the like are great for some usecases - for example a semantic search for relevant code snippets, and autocomplete. But beyond that, they only bring frustration in my use.
bunderbunder · 9h ago
Arguably most of the recent improvement in AI coding agents didn't exactly come from getting better at next token prediction in the first place. It came from getting better at context management, and RAG, and improvements on the usable context window size that let you do more with context management and RAG.
And I don't really see any reason to declare we've hit the limit of what can be done with those kinds of techniques.
bwfan123 · 9h ago
I am sure they will continue to improve just as the static-analyzers and linters are improving.
But, fundamentally, LLMs lack a theory of the program as intended in this comment https://news.ycombinator.com/item?id=44443109#44444904 . Hence, they can never reach the promised land that is being talked about - unless there are innovations beyond next-token prediction.
bunderbunder · 8h ago
They do lack a theory of program. But also, if there's one consistent theme that you can trace through my 25 of studying and working in ML/AI/whateveryouwanttocallit, it's that symbolic reasoning isn't nearly as critical to building useful tools as we like to think it is.
In other words, I would be wrong of me to assume that the only way I can think of to go about solving a problem is the only way to do it.
bunderbunder · 9h ago
Maybe. But there's also an argument to be made that an ounce of prevention is worth a pound of cure.
Maybe quite a few pounds, if the cure in question hasn't been invented yet and may turn out to be vaporware.
AsmodiusVI · 9h ago
Really appreciated this take, hits close to home. I’ve found LLMs great for speed and scaffolding, but the more I rely on them, the more I notice my problem-solving instincts getting duller. There’s a tradeoff between convenience and understanding, and it’s easy to miss until something breaks. Still bullish on using AI for exploring ideas or clarifying intent, but I’m trying to be more intentional about when I lean in vs. when I slow down and think things through myself.
cadamsdotcom · 4h ago
LLMs give you a power tool after you spent your whole career using hand tools.
A chainsaw and chisel do different things and are made for different situations. It’s great to have chainsaws, no longer must we chop down a giant tree with a chisel.
On the other hand there’s plenty of room in the trade for handcraft. You still need to use that chisel to smooth off the fine edges of your chainsaw work, so your teammates don’t get splinters.
KaiMagnus · 7h ago
I’ve been starting my prompts more and more with the phrase „Let’s brainstorm“.
Really powerful seeing different options, especially based on your codebase.
> I wouldn't give them a big feature again. I'll do very small things like refactoring or a very small-scoped feature.
That really resonates with me. Anything larger often ends badly and I can feel the „tech debt“ building in my head with each minute Copilot is running. I do like the feeling though when you understood a problem already, write a detailed prompt to nudge the AI into the right direction, and it executes just like you wanted. After all, problem solving is why I’m here and writing code is just the vehicle for it.
alexvitkov · 10h ago
I've found the Cursor autocomplete to be nice, but I've learned to only accept a completion if it's byte for byte what I would've written. With the context of surrounding code it guesses that often enough to be worth the money for me.
The chatbot portion of the software is useless.
cornfieldlabs · 9h ago
For me it's the opposite.
Autocomplete suggests the lines I just deleted and also suggests completely useless stuff. I have a shortcut to snooze (it's possible!) it. It interrupts flow my flow a lot. I would rather those stuff myself.
Chat mode on the other hand follows my rules really well.
I mostly use o3 - it seems to be the only model that has "common sense" in my experience
veselin · 9h ago
I think that people are just too quick to assume this is amazing, before it is there. Which doesn't mean it won't get there.
Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.
Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.
furyofantares · 10h ago
I've gone the other way and put a lot of effort into figuring out how to best utilize these things. It's a rough learning curve and not trivial, especially given how effortless stuff looks and feels at first.
bgwalter · 8h ago
This is still an ad that tries to lure heretics in by agreeing with them. This is the new agile religion. Here are suit-optimized diagrams:
"Interwoven relationship between the predictable & unpredictable."
Gepsens · 4h ago
Llms are not a magic wand you can wave at anything and get your work cut out for you. What's new ?
jemiluv8 · 2h ago
When I first came across the idea of vibe coding, my first reaction was that this
was taking things too far. Isn't it enough that your LLM can help you do
- autocomplete
- suggest possible solutions to a problem you've taken the time to understand
- helps you spend less time reading documentation and instead help guide your
approach and sometimes even helps you identify obscure apis that could help you get shit done
- help you review your code
- come up with multiple designs for a solutions
- evaluate multiple designs you come up with for trade-offs
- help you understand your problem better and the available apis
- write a prototype of some piece of code
I feel like LLMs are already doing quite a lot. I spend less time rummaging through documentation or trying to remember obscure api's or other pieces of code in a software project. All I need is a strong mental model about the project and how things are done.
There is a lot of obvious heavy lifting that LLMs are doing that I for one am not able to take for granted.
For people facing constraints similar to those in a resource constrained economic environment, the benefits of any technology that helps them spend less time doing
work that doesn't deliver value is immediately visible/obvious/apparent.
It is no longer an argument about whether it is a hype or something, it is more about how best to use it to achieve your goals. Forget the hype. Forget the marketing of AI companies - they have to do that to sell their products - nothing wrong with that. Don't let companies or bloggers set your own expectations of what could or should be done with this piece of tech. Just get on the bandwagon and experiment and find out what is too much. In the end I feel we will all come
from these experiments knowing that LLMs are already doing quite a lot.
TRIVIA
I even came by this article https://www.greptile.com/blog/ai-code-reviews-conflict. That clearly pointed out how LLM reliance can bring both the 10x dev and 1x dev closer to a median of "goodness". So the 10x dev is probably worse and the 1x dev ends up getting better - I'm probably that guy because I tend to mis subtle things in code and copilot review has had my ass for a while now - I haven't had defects like that in a while.
piker · 4h ago
Credit to the Zed team here for publishing something somewhat against its book.
I spent today rewriting a cloud function I'd done with the "help" of an LLM.
Looked like dog shit, but worked fine till it hit some edge cases.
Had to break the whole thing down again and pretty much start from scratch.
Ultimately not a bad day's work, and I still had it on for autocomplete on doc-strings and such, but like fuck will I be letting an agent near code I do for money again in the near future.
i_love_retros · 6h ago
Uh oh, I think the bubble is bursting.
Personally the initial excitement has worn off for me and I am enjoying writing code myself and just using kagi assistant to ask the odd question, mostly research.
When a team mate who bangs on about how we should all be using ai tried to demo it and got things in a bit of a mess, I knew we had peaked.
And all that money invested into the hype!
thimkerbell · 8h ago
We need an app to rate posts on how clickbaity their titles are, and let you filter on this value.
65 · 5h ago
LLMs require fuzzy input and are thus good for fuzzy output, mostly things like recommendations and options. I just do not see a scenario where fuzzy input can lead to absolute, opinionated output unless extremely simple and mostly done before already. Programming, design, writing, etc. all require opinions and an absolute output from the author to be quality.
somewhereoutth · 5h ago
Unfortunately it seems that nobody 'dialed back' LLM usage for the summary on that page - a good example of how un-human such text can feel to read.
incomingpain · 8h ago
Using gemini cli, I really need to try out claude code 1 day, and you ask it to make a change and it gives you the diff on what it plans to change.
You can say no, then give it more specific instructions like "keep it more simple" or "you dont need that library to be imported"
You can read the code and ensure you understand what it's doing.
delusional · 8h ago
I wish there was a browser addon that worked like ublock but for LLM talk. Like just take it all, every blog post, every announcement, every discussion and wipe it all away. I just want humanity to deal with some of our actual issues, like fascism, war in Europe and the middle east, the centralization of our lines of production, the unfairness in our economies.
Instead we're stuck talking about if the lie machine can fucking code. God.
lucasluitjes · 6h ago
Ironically if you wanted to build that accurately and quickly, you would probably end up having an LLM classify content as being LLM-related or not. Keyword-based filtering would have many false positives, and training a model takes more time to build.
pmxi · 5h ago
I’m sure you could build a prototype add on to do this pretty quickly with Claude Code or the like
xyst · 9h ago
Just like garbage sources on search engines or trash stack overflow answers. There’s still plenty of junk to sift through with LLM.
LLM will even through irrelevant data points in the output which causes further churn.
I feel not much has changed.
turbofreak · 9h ago
Is this Zed Shaw’s blog?
gpm · 7h ago
Nah, it's the people/company behind the Zed editor, who are in part the people who were originally behind the Atom editor. https://zed.dev/team
chasing · 10h ago
LLMs save me a lot of time as a software engineer because they save me a ton of time doing either boilerplate work or mundane tasks that are relatively conceptually easy but annoying to actually have to do/type/whatever in an IDE.
But I still more-or-less have to think like a software engineer. That's not going to go away. I have to make sure the code remains clean and well-organized -- which, for example, LLMs can help with, but I have to make precision requests and (most importantly) know specifically what I mean by "clean and well-organized." And I always read through and review any generated code and often tweak the output because at the end of the day I am responsible for the code base and I need to verify quality and I need to be able to answer questions and do all of the usual soft-skill engineering stuff. Etc. Etc.
So do whatever fits your need. I think LLMs are a massive multiplier because I can focus on the actual engineering stuff and automate away a bunch of the boring shit.
But when I read stuff like:
"I lost all my trust in LLMs, so I wouldn't give them a big feature again. I'll do very small things like refactoring or a very small-scoped feature."
I feel like I'm hearing something like, "I decided to build a house! So I hired some house builders and told them to build me a house with three bedrooms and two bathrooms and they wound up building something that was not at all what I wanted! Why didn't they know I really liked high ceilings?"
patrickmay · 9h ago
> [LLMs] save me a ton of time doing either boilerplate work
I hear this frequently from LLM aficionados. I have a couple of questions about it:
1) If there is so much boilerplate that it takes a significant amount of coding time, why haven't you invested in abstracting it away?
2) The time spent actually writing code is not typically the bottleneck in implementing a system. How much do you really save over the development lifecycle when you have to review the LLM output in any case?
hedgehog · 9h ago
I don't know about the boilerplate part but when you are e.g. adding a new abstraction that will help simplify an existing pattern across the code base something like Copilot saves a ton of time. Write down what has to happen and why, then let the machine walk across the code base and make updates, update tests and docs, fix whatever ancillary breaks happen, etc. The real payoff is making it cheaper to do exploratory refactors and simple features so you can focus on making the code and overall design better.
patrickmay · 9h ago
That's an interesting approach. You still have to review all the changes to make sure they're correct and that the code is maintainable, though. I could see this being a net savings on a legacy code base or a brand new system still in the "sketching" phase.
hedgehog · 9h ago
Yes, one of the reasons I like Copilot over some of the terminal-based systems I've seen is the changes are all staged for review in VS Code so you have all the navigation etc tools and can do whatever needs to be done before committing. It saves a lot of time, even on new features. I think of it like a chainsaw, powerful but a little bit imprecise.
emilecantin · 9h ago
I'm in a similar boat. I've only started using it more very recently, and it's really helping my "white-page syndrome" when I'm starting a new feature. I still have to fix a bunch of stuff, but I think it's easier for me to fix, tweak and refactor existing code than it is to write a new file from scratch.
Often times there's a lot of repetition in the app I'm working on, and there's a lot of it that's already been abstracted away, but we still have to import the component, its dependencies, and setup the whole thing which is indeed pretty boring. It really helps to tell the LLM to implement something and point it to an example of the style I want.
extr · 7h ago
This is the killer app for LLMs for me. I used to get super bogged down in the details of what I was trying to do, I would go a whole afternoon and while I would have started on the feature - I wouldn't have much to show for it in terms of working functionality. LLMs just provide a direction to go in and "get something up" before having to think through every little edge case and abstraction. Later once I have a a better idea of what I want, I go in and refactor by hand. But at least "it works" temporarily, and I find refactoring more enjoyable than writing fresh code anyway, primarily due to that "white page" effect you mention.
patrickmay · 9h ago
Maybe it's my Lisp background, where it's arguably easier, but I find myself immediately thinking "Eliminate that repetition."
jjangkke · 8h ago
The problem with zed's narrative is that because he failed to use it in productive ways he wants to dial it back altogether but its not clear what he has actually attempted and people dogpiling here reminds me of artists who are hostile to AI tools, it doesn't accurately reflect the true state of the marketplace which actually puts a lot of value on successful LLM/AI tool use especially in the context of software development.
If you extrapolate this blog then we shouldn't be having so much success with LLMs, we shouldn't be able to ship product with fewer people, and we should be hiring junior developers.
But the truth of the matter is, especially for folks that work on agents focusing on software development is that we can see a huge tidal shift happening in ways similar to artists, photographers, translators and copywriters have experienced.
The blog sells the idea that LLM is not productive and needs to be dialed down does not tell the whole story. This does not mean I am saying LLM should be used in all scenarios, there are clearly situations where it might not be desirable, but overall the productivity hinderance narrative I repeatedly see on HN isn't convincing and I suspect is highly biased.
tequila_shot · 8h ago
so, this is not from a developer called zed, but instead a developer called Alberto. This is stated in the first line in the article.
This is blog post/comment section summary encountered many times per day.
The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
I use an LLM primarily for smaller, focused data analysis tasks so it's possible to move fast and still stay reasonably on top of things if I'm even a little bit careful. I think it would be really easy to trash a large code base in a hurry without some discipline and skill in using LLM. I'm finding that developing prompts, managing context, controlling pace, staying organized and being able to effectively review the LLM's work are required skills for LLM-assisted coding. Nobody teaches this stuff yet so you have to learn it the hard way.
Now that I have a taste, I wouldn't give it up. There's so much tedious stuff I just don't want to have to do myself that I can offload to the LLM. After more than 20 years doing this, I don't have the same level of patience anymore. There are also situations where I know conceptually what I want to accomplish but may not know exactly how to implement it and I love the LLM for that. I can definitely accomplish more in less time than I ever did before.
Wow! Lol. One sentence to rule them all.
https://static1.srcdn.com/wordpress/wp-content/uploads/2019/...
That hate fuels me to just do the work myself. It's like the same trick as those engagement-bait math problems that pop up on social media with the wrong answer.
After that, I may ask an LLM to write particular functions, giving it data types and signatures to guide it.
Holy shit that's the best description of this phenomenon I've heard so far. The most stark version of this I've experienced is working on a side project with someone who isn't a software engineer who vibe coded a bunch of features without my input. The code looked like 6-8 different people had worked on it with no one driving architecture and I had to untangle how it all got put together.
The sweet spot for me is using it in places where I know the exact pattern I want to use to solve a problem and I can describe it in very small discrete steps. That will often take something that would have taken me an hour or two to hand code something tedious down to 5-10 minutes. I agree that there's no going back, even if all progress stopped now that's too huge of a gain to ignore it as a tool.
In some cases, this approach might even be slower than writing the code.
Did you not need all these skills / approaches / frameworks for yourself / coding with a team?
This is , I think, the key difference in those (such as myself) who find LLMs to massively increase velocity / quality / quantity of output and those who don’t.
I was already highly effective at being a leader / communicator / delegating / working in teams ranging from small , intimate , we shared a mental model / context up to some of the largest teams on the planet.
If someone wasn’t already a highly effective IC/manager/leader pre LLM, an LLM will simply accelerate how fast they crash into the dirt.
It takes substantial work to be a highly effective contributor / knowledge worker at any level. Put effort into that , and LLMs become absolutely indispensable, especially as a solo founder.
Quite the selling point.
I stopped reading right here. Presumably other people did too. I don't think you're aware of the caliber of your hubris. About howitzer sized.
I don't mind when other programmers use AI, and use it myself. What I mind is the abdication of responsibility for the code or result. I don't think that we should be issuing a disclaimer when we use AI any more than when I used grep to do the log search. If we use it, we own the result of it as a tool and need to treat it as such. Extra important for generated code.
Should LLM users invest in both biological (e.g. memory palace) and silicon memory caches?
Usually it's not because people think it can't be done, or shouldn't be done, it's because of this law. Like yes in an ideal world we'd do xyz, but department head of product A is a complete anti-productive bozo that no one wants to talk to or deal with, so we'll engineer around him kind of a thing. It's incredibly common once you see it play out you'll see it everywhere.
Sometimes, forking product lines, departments or budgets.
https://www.youtube.com/watch?v=5IUj1EZwpJY
This analysis of the real-world effects of Conway's Law seems deeply horrifying, because the implication seems to be that there's nothing you can do to keep communication efficiency and design quality high while also growing an organisation.
---
disclaimer: if low information density is your thing, then your mileage may vary. Video's are for documentaries, not for reading out an article in the camera.
Either way, it's a step-by-step walk through of the ideas of the original article that introduced Conway's Law and a deeper inspection into ideas about _why_ it might be that way.
If that's not enough then my apologies but I haven't yet found an equivalent article that goes through the ideas in the same way but in the kind of information-dense format that I assume would help you hit your daily macros.
Edit: Accidentally a word
And probably a few minutes of commercials too. I get the impression this is an emerging generational thing, but unless it's a recorded university course or a very interesting and reputable person.. no thanks. What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload.. yet this noble impulse is clearly self-defeating in this circumstance. Why wait and/or click-through ads for something that's low-density in the first place, you can't search, etc.
Especially now that you can transcript the video and quickly get AI to clean it up into a post, creating/linking a video potentially telegraphs stuff like: nothing much to say but a strong desire to be in the spotlight / narcissism / an acquisitiveness for clicks / engagement. Patiently enduring infinite ads while you're pursuing educational goals and assuming others are willing to, or assuming other people are paying for ad-free just because you do, all telegraphs a lack of respect for the audience, maybe also a lack of self-respect. Nothing against OP or this video in particular. More like a PSA about how this might come across to other people, because I can't be the only person that feels this way.
Always and entirely subjective of course, but I find Casey Muratori to be both interesting and reputable.
> What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload...
What's even weirder is the instinct to not actually engage with the content of the linked video and a discussion on Conway's Law and organisational efficiency and instead head straight into a monologue about some kind of emerging generational phenomenon of laziness highlighted by a supposed preference for long video content, which seems somewhat ironic itself as ignoring the original subject matter to just post your preferences as 'PSA' is its own kind of laziness. To each their own I guess.
Although I do think the six-hour YouTube 'essays' really could do with some serious editing, so perhaps there's something there after all...
Self-regulating.
I think this is often overlooked, because on the one hand it's really impressive what the predictive model can sometimes do. Maybe it's super handy as an autocomplete, or an exploration, or for rapidly building a prototype? But for real codebases, the code itself isn't the important part. What matters is documenting the business logic and setting it up for efficient maintenance by all stakeholders in the project. That's the actual task, right there. I spend more time writing documentation and unit tests to validate that business logic than I do actually writing the code that will pass those tests, and a lot of that time is specifically spent coordinating with my peers to make sure I understand those requirements, that they were specified correctly, that the customer will be satisfied with the solution... all stuff an LLM isn't really able to replace.
IMO, LLMs of today are not capable of building theories (https://news.ycombinator.com/item?id=44427757#44435126). And, if we view programming as theory building, then LLMs are really not capable of coding. They will remain useful tools.
describe User do ... it ".."
for the thousand time.. or write the controller files with CRUD actions..
LLMS can do these. I can then review the code, improve it and go from there.
They are also very useful for brain storming ideas, I treat it as a better google search. If I'm stuck trying to model my data, I can ask it questions and it gives me recommendations. I can then think about it and come up with an approach that makes sense.
I also noticed that LLMs really lack basic comprehension. For example, no matter how many times you provide the Schema file for it (or a part of it) , it still doesn't understand that a column doesn't exist on a model and will try to shove it in the suggested code.. very annoying.
All that being said, I have an issue with "vibe coding".. this is where the chaos happens as you blindly copy and paste everything and git push goodbye
Likely (as am I).
> LLMs are exciting but they produce messy code for which the dev feels no ownership. [...] The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project
Yup, can confirm, there are indeed people with differing opinions and experience/anecdotes on HN.
> want to tell you how awesome their workflow is without actually showing any code.
You might be having some AI-news-fatigue (I can relate) and missed a few, but there are also people who seem to have gotten it and do want to show code:
Armin Ronacher (of Flask, Jinja2, Sentry fame): https://www.youtube.com/watch?v=nfOVgz_omlU (workflow) and https://lucumr.pocoo.org/2025/6/21/my-first-ai-library/ (code)
Here's one of my non-trivial open source projects where a large portion is AI built: https://github.com/senko/cijene-api (didn't keep stats, I'd eyeball it at conservatively 50% - 80%)
- one big set of users who don't like it because it generates a lot of code and uses its own style of algorithms, and it's a whole lot of unfamiliar code that the user has to load up in their mind - as you said. Too much to comprehend, and quickly overwhelming.
And then to either side
- it unblocks users who simply couldn't have written the code on their own, who aren't even trying to load it into their head. They are now able to make working programs!
- it accelerates users who could have written it on their own, given enough time, but have figured out how to treat it as an army of junior coders, and learned to only maintain the high level algorithm in their head. They are now able to build far larger projects, fast!
The only thing the "AI" is marginally good at is as a fancy auto-complete that writes log statements based on the variable I just wrote into the code above it. And even this simple use case it gets it wrong a fair amount.
Overall the "AI" is a net negative for me, but maybe close to break-even thanks to the autocomplete.
My org has 160 engineers working on our e-commerce frontend and middle tiers. I constantly dive into repos and code I have no ownership of. The gitblame shows a contractor who worked here 3 years ago frequently.
Seems LLM does good in small, bad in medium, good again as small modules within big.
It's as miserable. I hate working on large codebases with multiple contributors unless there is super strong leadership that keeps things aligned.
Who says it is? The arguably most famous book in the history of software engineering makes that point and precedes LLMs by half a century
This is definitely something I feel is a choice. I've been experimenting quite a bit with AI generated code, and with any code that I intend to publish or maintain I've been very conscious in making the decision that I own the code and that if I'm not entirely happy with the AI generated output I have to fix it (or force the AI to fix it).
Which is a very different way of reviewing code than how you review another humans code, where you make compromises because you're equals.
I think this produces fine code, not particularly quickly but used well probably somewhat quicker (and somewhat higher quality code) than not using AI.
On the flip side on some throwaway experiments and patches to personalize open source products that I have absolutely no intention of upstreaming I've made the decision that the "AI" owns the code, and gone much more down the vibe coding route. This produces unmaintainable sloppy code, but it works, and it takes a lot less work than doing it properly.
I suspect the companies that are trying to force people to use AI are going to get a lot more of the "no human ownership" code than individuals like me experimenting because they think its interesting/fun.
Below is a link to a great article by Simon Willison explaining an LLM assisted workflow and the resulting coded tools.
[0] https://simonwillison.net/2025/Mar/11/using-llms-for-code/ [1] https://github.com/simonw/tools
Meanwhile, it's not uncommon to see people on HN saying they're orchestrating multiple major feature implementations in parallel. The impression we get here is that Simon Willson's entire `tools` featureset could be implemented in a couple of hours.
I'd appreciate some links to the second set of people. Happy to watch YouTube videos or read more in-depth articles.
"f you assume that this technology will implement your project perfectly without you needing to exercise any of your own skill you’ll quickly be disappointed."
"They’ll absolutely make mistakes—sometimes subtle, sometimes huge. These mistakes can be deeply inhuman—if a human collaborator hallucinated a non-existent library or method you would instantly lose trust in them"
"Once I’ve completed the initial research I change modes dramatically. For production code my LLM usage is much more authoritarian: I treat it like a digital intern, hired to type code for me based on my detailed instructions."
"I got lucky with this example because it helped illustrate my final point: expect to need to take over. LLMs are no replacement for human intuition and experience. "
What that gets me though is less typing fatigue and less decisions made partly due to my wrists/etc. If it's a large (but simple!) refactor, the LLM generally does amazing at that. As good as i would do. But it does that with zero wrist fatigue. Things that i'd normally want to avoid or take my time on it bangs out in minutes.
This coupled with Claude Code's recently Hook[1] introduction and you can help curb a lot of behaviors that are difficult to make perfect from an LLM. Ie making sure it tests, formats, Doesn't include emojis (boy does it like that lol), etc.
And of course a bunch of other practices for good software in general make the LLMs better, as has been discussed on HN plenty of times. Eg testing, docs, etc.
So yea, they're dumb and i don't trust their "thinking" at all. However i think they have huge potential to help us write and maintain large codebases and generally multiplying out productivity.
It's an art for sure though, and restraint is needed to prevent slop. They will put out so. much. slop. Ugh.
[1]: https://docs.anthropic.com/en/docs/claude-code/hooks
One example I deal with frequently is creating Pytorch models. Any real model is absolutely not something you want to leave in the hands of an LLM since the entire point of modeling is to incorporate your own knowledge into the design. But there is a lot of tedium, and room for errors, in getting the initial model wiring setup.
While, big picture, this isn't the 10x (or more) improvement that people like to imagine, I find in practice I personally get really stuck on the "boring parts". Reducing the time I spend on tedious stuff tends to have a pretty notable improvement in my overall flow.
Or, often, sell you something.
I don't see why any of this should be surprising. I think it just reflects a lot of developers using this technology and having experiences that fall neatly into one of these two camps. I can imagine a lot of factors that might pull an individual developer in one direction or the other; most of them probably correlate, and people in the middle might not feel like they have anything interesting to say.
I suspect that's at least partially because all of that doesn't stop the hype from being pushed on and on without mercy. Which in turn is probably because the perverse amounts of investment that went into this have to be reclaimed somehow with monetization. Imagine all those VCs having to realize that hundreds of billions of $$$ are lost to wishful hallucinations. Before they concede that there will of course be much astroturfing in the vein of your last paragraph.
The idea that the future is going to “more or less be predictable” and “within the realm of normal” is a pretty bold claim when you look at history! Paradigm shifts happen. And many people think we’re in the middle of one — people that don’t necessarily have an economic interest in saying so.
* I’m not taking a position here about predicting what particular AI technologies will come next, for what price, with what efficiency and capabilities, and when. Lots of things could happen we can’t predict — like economic cycles, overinvestment, energy constraints, war, popular pushback, policy choices, etc. But I would probably bet that LLMs are just the beginning.
- Brian Kernighan
So if there's a bug in code that an LLM wrote, simply wait 6 months until the LLMs are twice as smart?
> simply wait [about a quarter of a million minutes] until the LLMs are twice as smart?
...
People will always blame someone or something else for laziness.
When I first got into web development there were people using HTML generators based on photoshop comps. They would produce atrocious HTML. FE developers started rewriting the HTML because otherwise you'd end up with brittle layout that was difficult to extend.
By the time the "responsive web" became a thing HTML generators were dead and designers were expected to give developers wireframes + web-ready assets.
Same thing pretty much happened with UML->Code generators, with different details.
There's always been a tradeoff between the convenience and deskilling involved in generated code and the long term maintainability.
There's also the fact that coding is fundamentally an activity where you try to use abstractions to manage complexity. Ideally, you have interfaces that are good enough that the code reads like a natural language because you're expressing what you what the computer to do at the exact correct layer of abstraction. Code generators tend to both cause and encourage bad interfaces. Often the impetus to use a code generator is the existing interfaces are bad, bureaucratic, or obscure. But using code generators ends up creating more of the same.
But also. But also, LLMs are incredibly powerful and capable tools at discovering and finding what the architecture of things is. They have amazing abilities to analyze huge code bases & to build documents and diagrams to map out the system. They can answer all manners of questions, to let us probe in.
I'm looking for good links here. I know I have run across some good stuff before. But uhh I guess this piece that I just found introduces the idea well enough. https://blog.promptlayer.com/llm-architecture-diagrams-a-pra...
And you can see an example in the ever excellent @simonw's review of prompts in copilot. At the end, they briefly look at testing, and it's a huge part of the code base. They ask Gemini for a summary and it spits out a beautiful massive document on how copilot testing works! https://simonwillison.net/2025/Jun/30/vscode-copilot-chat/ https://github.com/simonw/public-notes/blob/main/vs-code-cop...
Now, whether LLMs generate well architects systems is largely operator dependent. There's lots of low effort zero shot ways to give LLMs very little guidance and get out who knows what. But when I reflect on the fact that, for now, most code is legacy code, and most code is hideously under documented, most people reading code don't really have access to experts or artifacts to explain the code and it's architecture, my hope and belief is that LLMs are incredible tools to radically increase maintainability versus where we are now, that they are powerful peers in building the mental model of programming & systems.
This happens with any sufficiently big/old codebase. We can never remember everything, even if we wrote it ourselves
I do agree with the sentiment and insight about the 2 branches of topics frequently seen lately on HN about AI-assisted coding
Would really like to see a live/video demo of semi-autonomous agents running in parallel and executing actual useful tasks on a decently complex codebase, ideally one that was entirely “manually” written by devs before agents are involved - and that actually runs a production system with either lots of users or paid customers
The important thing about a codebase wasn't ever really size or age, but whether it was a planned architecture or grown organically. The same is true post-LLM. Want to put AI in charge of tool-smithing inconsequential little widgets that are blocking you? Fine. Want to put AI in charge of deciding your overall approach and structure? Maybe fine. Worst of all is to put the AI in charge of the former, only to find later that you handed over architectural decisions at some point and without really intending to.
That sounds like a hard or, as if ten years of development of a large codebase was entirely known up-front with not a single change to the structure over time that happened as a result of some new information.
"We build our computers the way we build our cities—over time, without a plan, on top of ruins." -- Ellen Ullman
Wow you really nail the point, that's what I felt but I did not understand. Thanks for the comment.
an argument can be made that the code doesn't matter as long as the product works as it's supposed to (big asterisk here)
The only goal of a code generator is the code. I don't care whether it works or not (for specific scenarios and it could break 90% of the time). I want to see the generated code and, so far, I have never seen anything interesting besides todo lists made with ReactJS.
This is a great read on the situation. Do you think these people are just making it up/generating baseless hype?
I have seen a few of these full blown llm coded projects and every one of them has has some giant red flashing warning at the top of the README about the project being llm generated.
So I think it’s probably a mix of avoiding embarrassment and self preservation.
I don't think that's the main reason. Well written code is easier to follow even when you haven't written it yourself (or maybe you did but forgot about it).
How you get good results as a team is to develop a shared mental model, and that typically needs to exist in design docs. I find that without design docs, we all agree verbally, and then are shocked at what everyone else thought we'd agreed on. Write it down.
LLMs, like junior devs, do much better with design docs. You can even let the junior dev try writing some design docs.
So if you're a solo developer, I can see this would be a big change for you. Anyone working on a team has already had to solve this problem.
On the subject of ownership: if I commit it, I own it. If the internet "goes down" and the commit has got my name on it, "but AI" isn't going to cut it.
There have been grifters hopping onto every trend. Have you noticed they never show you what exactly they built or if it was ever useful.
honestly, the vast majority of basically CRUD apps out there we are inflating our skills a bit too much here. even if the code is junk you can adapt your mindset to accept what LLMs produce, clean it up a bit, and come out with something maintainable.
like do these people ever have to review code from other people or juniors? the feedback loop here is tighter (although the drawback is your LLM doesn't "learn").
i wouldn't use it for anything super novel or cutting edge i guess, but i don't know, i guess everyone on HN might be coding some super secret advanced project that an LLM can't handle....?
Ultimately I am responsible for any code I check in even if it was written by an LLM, so I need to perform these lengthy reviews. As others have said, if it is code that doesn't need to be maintained, then reviewing the code can be a much faster process. This is why it is so popular for hobby projects since you don't need to maintain the code if you don't want to, and it doesn't matter if you introduce subtle but catastrophic bugs.
Ultimately the tech feels like a net neutral. When you want to just throw the code away after it is very fast and good enough. If you are responsible for maintaining it, its slower than writing it yourself.
if i need to work on something mission critical or new i do it by hand first. tests catch everything else. or you can just run it so that you review every change (like in claude code) as it comes in and can still grok the entire thing vs having to review multiple large files at the end.
thus i literally wonder what people are working on that requires this 100% focused mission critical style stuff at all times. i mean i don't think it's magic or AGI, but the general argument is always 1) works for hobby projects but not "production" 2) the LLM produces "messy code" which you have to review line by line as if you wrote it yourself which i've found to not be true at all.
But I’m in no rush to invite an army of people to compete with me just yet. I’ll be back when I’m sipping coladas on a beach to tell you what I did.
They are exactly like steroids - bigger muscles fast but tons of side effects and everything collapses the moment you stop. Companies don't care because they are more concerned about getting to their targets fast instead of your health.
Another harmful drug for our brain if consumed without moderation. I won't entirely stop using them but I have already started to actively control/focus my usage.
But even he doesn't think AI shouldn't be used. Go ahead and use it for stuff like email but don't use it for your core work.
[1] https://calnewport.com/does-ai-make-us-lazy/
I'm generally sympathetic to the idea that LLMs can create atrophy in our ability to code or whatever, but I dislike that this clickbaity study gets shared so much.
Tools that do many things and tools that do a small number of things are still tools.
> "do we really even need a body at all anymore?"
It's a legitimate question. What's so special about the body and why do we need to have one? Would life be better or worse without bodies?
Deep down I think everyone's answer has more to do with spirituality than anything else. There isn't a single objectively correct response.
https://www.youtube.com/watch?v=SxdOUGdseq4
LLMs seem to be really good at reproducing the classic Ball of Mud, that can't really be refactored or understood.
There's a lot of power in creating simple components that interact with other simple components to produce complex functionality. While each component is easy to understand and debug and predict its performance. The trick is to figure out how to decompose your complex problem into these simple components and their interactions.
I suppose once LLMs get really good at that skill, will be when we really won't need developers any more.
Like, I might ask an LLM on its opinion on the best way to break something down, to see if it 'thinks' of anything i havent, and then ask it to implement that. I wouldn't ask it to do the whole thing from scratch with no input on how to structure things.
kind of.
This, but but only for code. I've seen "leaders" at work suggest that we "embrace" AI, even for handling production systems and managing their complexity. That's like saying: "We've built this obscure, inscrutable system, therefore we need another obscure, inscrutable system on top of it in order to understand it!". To me, this sounds deranged, but the amount of gaslighting that's going on also makes you think you're the only to believe that...
I don't really get this argument. So when LLMs become "perfect" software developers are we just going to have them running 24/7 shitting out every conceivable piece of software ever? What would anyone do with that?
Or do you expect every doctor, electrician, sales assistant, hairdresser, train driver etc. to start developing their own software on top of their existing job?
What's more likely is a few people will make it their jobs to find and break down problems people have that could use a piece of software and develop said piece of software using whatever means they have available to them. Today we call these people software developers.
I started my software career by automating my job, then automating other people’s jobs. Eventually someone decided it would be easier to just hire me as a software engineer.
I just met with an architect for adding a deck onto my house (need plans for code compliance). He said he was using AI to write programs that he could use with design software. He demoed how he was using AI to convert his static renders into walkthrough movies.
I instead started asking where I might look something up - in what man page, or in which documentation. Then I go read that.
This helps me build a better mental map about where information is found (e.g., in what man page), decreasing both my reliance on search engines, and LLMs in the long run.
LLMs have their uses, but they are just a tool, and an imprecise one at that.
"I want to run webserver on Android but it does not allow binding on ports lower than 1000. What are my options?"
Both responded with below solutions
1. Use reverse proxy
2. Root the phone
3. Run on higher port
Even after asking them to rethink they couldn't come up with the solution I was expecting. The solution to this problem is HTTPS RR records[1]. Both models knew about HTTPS RR but couldn't suggest it as a solution. It's only after I included it in their context both agreed it as a possible solution.
[1]: https://rohanrd.xyz/posts/hosting-website-on-phone/
For an LLM, I’d expect it to be similar. It can recall the stuff it’s seen thousands of times, but has a hard time recalling the niche/underdocumented stuff that it’s seen just a dozen times.
The human brain isn't a statistical aggregator. If you see a psychologically socking thing once in your lifetime, you might remember it even after dementia hits when you're old.
On the other hand, you pass by hundrends of shops every day and receive the data signal of their signs over and over and over, yet you remember nothing.
You remeber stuff you pay attention to (for whatever reason)
I am fine with this, but let's be clear about what we're expecting
Certainly a majority of people don't know this. What we're really asking is whether an LLM is expected to more than (or as much as) the average domain expert.
Only recently have I started interacting with LLM's more (I tried out a previous "use it as a book club partner" suggestion, and it's pretty great!).
When coding with them (via cursor), there was an interaction where I nudged it: "hey, you forgot xyz when you wrote that code the first time" (ie: updating an associated data structure or cache or whatever), and I find myself INTENTIONALLY giving the machine at least the shadow of a benefit of the doubt that: "Yeah, I might have made that mistake too if I were writing that code" or "Yeah, I might have written the base case first and _then_ gotten around to updating the cache, or decrementing the overall number of found items or whatever".
In the "book club" and "movie club" case, I asked it to discuss two movies and there were a few flubs: was the main character "justly imprisoned", or "unjustly imprisoned" ... a human might have made that same typo? Correct it, don't dwell on it, go with the flow... even in a 100% human discussion on books and movies, people (and hallucinating AI/LLM's) can not remember with 100% pinpoint accuracy every little detail, and I find giving a bit of benefit of the doubt to the conversation partner lowers my stress level quite a bit.
I guess: even when it's an AI, try to keep your interactions positive.
I guess it's also actually supported, unlike SRV that are more like supported only by some applications? Matrix migrated from SRV to .well-known files for providing the data. (Or I maybe it supports both.)
At least... this was before multiplayer discovery was commandeered. Matchmaking and so on largely put an end to opportunities.
Yes, an experienced person might be able to suss out what the real problem was, but it's not really the LLMs fault for answering the specific question it was asked. Maybe you just wanted to run a server for testing and didn't realize that you can add a non-standard port to the URL.
Not to mention the solution did end up being to use a higher port number...
Stuff that can't just be abstracted to a function or class but also require no real thought. Tests are often (depending on what they're testing) in this realm.
I was resistant at first, but I love it. It's reduced the parts of my job that I dislike doing because of how monotonous they are and replaced them with a new fun thing to do - optimizing prompts that get it done for me much faster.
Writing the prompt and reviewing the code is _so_ much faster on tedious simple stuff and it leaves the interesting, though provoking parts of my work for me to do.
Don’t feed many pages of code to AI, it works best for isolated functions or small classes with little dependencies.
In 10% of cases when I ask to generate or complete code, the quality of the code is less than ideal but fixable with extra instructions. In 25% of cases, the quality of generated code is bad and remains so even after telling it what’s wrong and how to fix. When it happens, I simply ignore the AI output and do something else reasonable.
Apart from writing code, I find it useful at reviewing new code I wrote. Half of the comments are crap and should be ignored. Some others are questionable. However, I remember a few times when the AI identified actual bugs or other important issues in my code, and proposed fixes. Again, don’t copy-paste many pages at once, do it piecewise.
For some niche areas (examples are HLSL shaders, or C++ with SIMD intrinsics) the AI is pretty much useless, probably was not enough training data available.
Overall, I believe ChatGPT improved my code quality. Not only as a result of reviews, comments, or generated codes, but also my piecewise copy-pasting workflow improved overall architecture by splitting the codebase into classes/functions/modules/interfaces each doing their own thing.
I agree it's good for helping writing smaller bits like functions. I also use it to help me write unit tests which can be kind of tedious otherwise.
I do think that the quality of AI assistance has improved a lot in the past year. So if you tried it before, maybe take another crack at it.
People really want a quick, low effort fix that appeals to the energy conserving lizard brain while still promising all the results.
In reality there aren't shortcuts, there's just tradeoffs, and we all realize it eventually.
Edit: In concrete terms the workflow is to allow Copilot to make changes, see what's broken, fix those, review the diff against the goal, simplify the changes, etc, and repeat, until the overall task is done. All hands off.
It's important to always maintain the developer role, don't ever surrender it.
The hard parts of engineering have always been decision making, socializing, and validating ideas against cold hard reality. But writing code just got easier so let's do that instead.
Prior to LLMs writing 10 lines of code might have been a really productive day, especially if we were able to thoughtfully avoid writing 1,000 unnecessary lines. LLMs do not change this.
[0] https://en.wikipedia.org/wiki/Streetlight_effect
I don’t have it write of my Python firmware or Elixir backend stuff.
What I do let it rough in is web front end stuff. I view the need for and utility of LLMs in the html/css/tailwind/js space as an indictment of complexity and inconsistency. It’s amazing that the web front end stuff has just evolved over the years, organically morphing from one thing to another, but a sound well engineered simple-is-best set of software it is not. And in a world where my efforts will probably work in most browser contexts, no surprise that I’m willing to mix in a tool that will make results that will probably work. A mess is still a mess.
I wrote about it: https://kamens.com/blog/code-with-ai-the-hard-way
I don't think LLM for coding productivity is all hype but I think for the people who "see the magic" there are many illusions here similar to those who fall prey to an MLM pitch.
You can see all the claims aren't necessarily unfounded, but the lack of guaranteed reproducibility leaves the door open for many caveats in favor of belief for the believer and cynicism for everybody else.
For the believers if it's not working for one person, it's a skill issue related to providing the best prompt, the right rules, the perfect context and so forth. At what point is this a roundabout way of doing it yourself anyway?
I've been allowing LLMs to do more "background" work for me. Giving me some room to experiment with stuff so that I can come back in 10-15 minutes and see what it's done.
The key things I've come to are that it HAS to be fairly limited. Giving it a big task like refactoring a code base won't work. Giving it an example can help dramatically. If you haven't "trained" it by giving it context or adding your CLAUDE.md file, you'll end up finding it doing things you don't want it to do.
Another great task I've been giving it while I'm working on other things is generating docs for existing features and modules. It is surprisingly good at looking at events and following those events to see where they go and generating diagrams and he like.
Whether someone’s litmus test is well-developed is another matter.
At first I was very enthusiastic and thought Codex is helping me multiplex myself. But you actually spend so much time trying to explain Codex the most obvious things and it gets them wrong all the time in some kind of nuanced way that in the end you spend more time doing things via Codex than by hand.
So I also dialed back Codex usage and got back to doing many more things by hand again because its just so much faster and much more predictable time-wise.
But it's also not crazy to think that with LLMs getting smarter (and considerable resources put into making them better at coding), that future versions would clean up and refactor code written by past versions. Correct?
And I don't really see any reason to declare we've hit the limit of what can be done with those kinds of techniques.
But, fundamentally, LLMs lack a theory of the program as intended in this comment https://news.ycombinator.com/item?id=44443109#44444904 . Hence, they can never reach the promised land that is being talked about - unless there are innovations beyond next-token prediction.
In other words, I would be wrong of me to assume that the only way I can think of to go about solving a problem is the only way to do it.
Maybe quite a few pounds, if the cure in question hasn't been invented yet and may turn out to be vaporware.
A chainsaw and chisel do different things and are made for different situations. It’s great to have chainsaws, no longer must we chop down a giant tree with a chisel.
On the other hand there’s plenty of room in the trade for handcraft. You still need to use that chisel to smooth off the fine edges of your chainsaw work, so your teammates don’t get splinters.
Really powerful seeing different options, especially based on your codebase.
> I wouldn't give them a big feature again. I'll do very small things like refactoring or a very small-scoped feature.
That really resonates with me. Anything larger often ends badly and I can feel the „tech debt“ building in my head with each minute Copilot is running. I do like the feeling though when you understood a problem already, write a detailed prompt to nudge the AI into the right direction, and it executes just like you wanted. After all, problem solving is why I’m here and writing code is just the vehicle for it.
The chatbot portion of the software is useless.
Chat mode on the other hand follows my rules really well.
I mostly use o3 - it seems to be the only model that has "common sense" in my experience
Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.
Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.
https://zed.dev/agentic-engineering
"Interwoven relationship between the predictable & unpredictable."
I feel like LLMs are already doing quite a lot. I spend less time rummaging through documentation or trying to remember obscure api's or other pieces of code in a software project. All I need is a strong mental model about the project and how things are done.
There is a lot of obvious heavy lifting that LLMs are doing that I for one am not able to take for granted.
For people facing constraints similar to those in a resource constrained economic environment, the benefits of any technology that helps them spend less time doing work that doesn't deliver value is immediately visible/obvious/apparent.
It is no longer an argument about whether it is a hype or something, it is more about how best to use it to achieve your goals. Forget the hype. Forget the marketing of AI companies - they have to do that to sell their products - nothing wrong with that. Don't let companies or bloggers set your own expectations of what could or should be done with this piece of tech. Just get on the bandwagon and experiment and find out what is too much. In the end I feel we will all come from these experiments knowing that LLMs are already doing quite a lot.
TRIVIA I even came by this article https://www.greptile.com/blog/ai-code-reviews-conflict. That clearly pointed out how LLM reliance can bring both the 10x dev and 1x dev closer to a median of "goodness". So the 10x dev is probably worse and the 1x dev ends up getting better - I'm probably that guy because I tend to mis subtle things in code and copilot review has had my ass for a while now - I haven't had defects like that in a while.
https://news.ycombinator.com/item?id=44003700
Looked like dog shit, but worked fine till it hit some edge cases.
Had to break the whole thing down again and pretty much start from scratch.
Ultimately not a bad day's work, and I still had it on for autocomplete on doc-strings and such, but like fuck will I be letting an agent near code I do for money again in the near future.
Personally the initial excitement has worn off for me and I am enjoying writing code myself and just using kagi assistant to ask the odd question, mostly research.
When a team mate who bangs on about how we should all be using ai tried to demo it and got things in a bit of a mess, I knew we had peaked.
And all that money invested into the hype!
You can say no, then give it more specific instructions like "keep it more simple" or "you dont need that library to be imported"
You can read the code and ensure you understand what it's doing.
Instead we're stuck talking about if the lie machine can fucking code. God.
LLM will even through irrelevant data points in the output which causes further churn.
I feel not much has changed.
But I still more-or-less have to think like a software engineer. That's not going to go away. I have to make sure the code remains clean and well-organized -- which, for example, LLMs can help with, but I have to make precision requests and (most importantly) know specifically what I mean by "clean and well-organized." And I always read through and review any generated code and often tweak the output because at the end of the day I am responsible for the code base and I need to verify quality and I need to be able to answer questions and do all of the usual soft-skill engineering stuff. Etc. Etc.
So do whatever fits your need. I think LLMs are a massive multiplier because I can focus on the actual engineering stuff and automate away a bunch of the boring shit.
But when I read stuff like:
"I lost all my trust in LLMs, so I wouldn't give them a big feature again. I'll do very small things like refactoring or a very small-scoped feature."
I feel like I'm hearing something like, "I decided to build a house! So I hired some house builders and told them to build me a house with three bedrooms and two bathrooms and they wound up building something that was not at all what I wanted! Why didn't they know I really liked high ceilings?"
I hear this frequently from LLM aficionados. I have a couple of questions about it:
1) If there is so much boilerplate that it takes a significant amount of coding time, why haven't you invested in abstracting it away?
2) The time spent actually writing code is not typically the bottleneck in implementing a system. How much do you really save over the development lifecycle when you have to review the LLM output in any case?
Often times there's a lot of repetition in the app I'm working on, and there's a lot of it that's already been abstracted away, but we still have to import the component, its dependencies, and setup the whole thing which is indeed pretty boring. It really helps to tell the LLM to implement something and point it to an example of the style I want.
If you extrapolate this blog then we shouldn't be having so much success with LLMs, we shouldn't be able to ship product with fewer people, and we should be hiring junior developers.
But the truth of the matter is, especially for folks that work on agents focusing on software development is that we can see a huge tidal shift happening in ways similar to artists, photographers, translators and copywriters have experienced.
The blog sells the idea that LLM is not productive and needs to be dialed down does not tell the whole story. This does not mean I am saying LLM should be used in all scenarios, there are clearly situations where it might not be desirable, but overall the productivity hinderance narrative I repeatedly see on HN isn't convincing and I suspect is highly biased.