The space is moving so fast that, if I wrote down my workflows and workarounds just two months ago, so much of it would be stale today. I think all these recommendations need to list the models and harnesses being described front and center.
motorest · 32d ago
> The space is moving so fast that, if I wrote down my workflows and workarounds just two months ago, so much of it would be stale today.
There is also the problem that none of these workflows were validated or verified. Everyone is free to go on social media or personal blogs and advertise their snake oil. Thus in a scenario where these workflows are found to be lacking, the perceived staleness might actually be ineffectiveness beyond self promotion.
aerhardt · 32d ago
I'm seeing a lot of this too. I can tell with my own eyes that the technology is extremely useful, but also that it has limits. On the internet however you'll see a decent amount of randos claiming that they're one-shotting hyperscale complex systems with the latest trendiest tool. My approach is to keep using LLMs where they reasonably make sense and experimenting here and there with new workflows and tools, but I'm past changing the way I work every two weeks.
brumar · 33d ago
Very important comment. My workflow changed dramatically with the increased capabilities of these tools.
Aeolun · 32d ago
Yeah. I have Claude 4 correcting code that Claude 3.5 wrote a few months ago.
motorest · 32d ago
> Yeah. I have Claude 4 correcting code that Claude 3.5 wrote a few months ago.
You don't even need to switch models. Write a prompt to generate some code and immediately after prompt the same model to review the code it just generated. Sometimes it takes 3 or 4 prompts to get the result to converge. But converge to where?
pmbanugo · 32d ago
I haven’t tried Claude 4. Maybe I’d give it a spin to see if it can improve my design document
bdangubic · 32d ago
hehehe - same :)
hoppp · 33d ago
I use the llm as a glorified search engine. Instead of googling I ask it stuff.
Its fine for that but its a hit or miss. Often the output is garbage and its better to just use google.
I dont use it much to generate code, I ask it higher level questions more often. Like when I need a math formula.
sam_bristow · 32d ago
My most common use is similar: when I'm working on problems in a somewhat unfamiliar domain finding out what their "terms of art" are. The chances that I've just come up with a completely new concept are pretty low so it's just a matter of helping me formulate my questions in the language of the existing body of knowledge.
NitpickLawyer · 32d ago
You should go a step further and integrate search (tavili, searxng, etc) into your flow. You'll get better results, and you can refine sources and gradually build a scored list of trusted sources.
notepad0x90 · 33d ago
is it really more efficient to have an LLM generate code, then review that code, fix errors and spend some time to fully understand it? I wish there were tangible stats and metrics around this. Is it really more efficient than just writing the code yourself, but using LLMs to look up things or demo solutions?
Aurornis · 33d ago
Lately I look for opportunities to have the LLM do some easy small work while I go work on some harder task in parallel.
I've also tried asking the LLM to come up with a proposed solution while I work on my own implementation at the same time.
LLMs can also be much faster if a task requires some repetitive work. When I recognize a task like that, I try coding the first version and then ask the LLM to follow my pattern for the other areas where I need to repeat the work.
mwcampbell · 32d ago
How did we end up here, just accepting that some coding tasks require repetitive work, and turning to a probabilistic text synthesizer that requires massive training data to automate that? We're having this discussion on a site whose founder famously wrote, over 20 years ago, that succinctness is power, and even wrote this site in a programming language that he designed to take that principle as far as he could. Now, two decades later, why have we so completely retreated from that dream?
I bear some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.
sanderjd · 32d ago
We haven't retreated from that dream. We're doing both things in parallel. But I think it will always be the case that some things are repetitive, even as we continuously expand the frontier of eliminating those things. It's good to have tools that help automate repetitive tasks, and it's also good to create more powerful abstractions. There is no contradiction.
dwaltrip · 32d ago
The ethos that compelled PG to write HN in lisp is what matters, not the actual usage of lisp itself. That ethos lives on, in various forms.
I also offer you an old saying we have all heard many times:
There are 2 kinds of languages: ones that everyone complains about and ones that nobody uses.
Aeolun · 32d ago
Some things are better with more code. Sets of data in particular (strongly typed). Sometimes you need to modify those sets of data and it doesn’t require enough work to write a whole script for, but you still don’t really want to spend the time manually modifying everything. LLM’s are really nice in those instances.
klabb3 · 32d ago
Plain data in all glory but I’d rather (deterministically) generate and modify it than having big blobs checked in. If I have too much copy pasted data I often forget to modify until the test case runs and I realize it. This creates a false sense of confidence since there could be tests that pass and nobody ever checks they’re wrong. Essentially, minimize the number of human steps to verify it looks right which typically means optimizing for the least amount of code.
Flashback to when I committed a suite of tests in Python that were indented one tab too much, resulting in them not running at all. This passed code review (on a FAANG company) and was discovered months later from an unrelated bug. The point is even unit tests have a very human element to them.
jampekka · 32d ago
> I some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.
In 2010s the move was towards more concise languages and programming techniques in e.g. JavaScript scene too. CoffeeScript is a prime example of this.
But then came the enterprise software people pushing their Javaisms and now we have verbose bondage and ceremony mess like TypeScript and ES6 modules.
And in a tragicomic turn after we made expressing programmer intent formally so difficult, we are turning into writing bad novels for LLMs in a crapshoot trial and error hoping it writes the pointless boilerplate correctly.
wredcoll · 32d ago
I cannot wait to see your next webapp written in J.
zackify · 33d ago
Agree with each of these points so much!
That’s why I really like copilot agent and codex right now.
Even more parallel stuff and from my phone when I’m just thinking of ideas.
max_on_hn · 32d ago
(Disclaimer: I built and sell a product around that workflow)
It often is, if you pick the right tasks (and more tasks fall into that bucket every few weeks).
You can get a simple but fully-working app out of a single prompt, though quality varies widely unless you’re very specific.
Once you have a codebase, agent output quality comes down to architecture and tests.
If you have a scalable architecture with well-separated concerns, a solid integration test harness with examples, and good documentation (features, stack, procedures, design constraints), then getting the exact change you want is a matter of how well you can articulate what you want.
One more asterisk, the development environment has to support the agent: like a human, agents work well with compiler feedback, and better with testing tools and documentation/internet access (yes my agents have these).
I use CheepCode to work on itself, but I am still building up the test library and preview environments to de-risk merging non-trivial PRs that I haven’t pulled down and run locally. I also use it to work on other apps that I'm building, and since those are far more self-contained / easier to test, I get much better results there.
If you want to put less effort into describing what you want, have a chat with an AI to generate tickets. Then paste those tickets into Linear and let CheepCode agents rip through them. I’ve got tooling in the works that will make that much easier, but I can only be in so many places at once as a bootstrapped founder :-)
motorest · 32d ago
> is it really more efficient to have an LLM generate code, then review that code, fix errors and spend some time to fully understand it?
The answer is always "it depends". There are some drudge work tasks that are brilliantly done by LLMs, such as generating unit tests or documentation. Often the first attempt is not great, but iterating over them is so fast that you can regenerate everything from scratch a dozen times before you spend as much time as you would do if you wrote them yourself.
It also depends on what scope you're working on. Small iterations have better results than grand redesigns.
Context is also critical. if your codebase is neatly organized with squeaky clean code then LLMs generate better recommendations. If your codebase is a mess with inconsistent styles in spaghetti code then your prompts tend to generate more of the same.
viraptor · 33d ago
Depends on the code, but often yes. The less you care about that specific result, the more efficient it is. One-off tools under 2k lines where you can easily verify the result? Why would I not generate that and save time for more interesting stuff?
layer8 · 33d ago
One-off tools seem to be more ops then dev, to me.
viraptor · 33d ago
I haven't heard the "scripting is not programming" or similar take since newsgroups. It's really time to let it die.
Lalabadie · 33d ago
One of my pet hypotheses is that the top X% of Excel users vastly outperform the same bottom % of programmers in getting to usable results.
fenomas · 33d ago
Totally agree! I know a guy whose main job is basically to be an Excel expert (nominally he works in logistics), and seeing some of his work convinced me that he's actually a programmer who uses excel as an IDE - akin to visual programming.
callc · 32d ago
We should adopt the term “tabular programmers” or “tabular engineers”!
layer8 · 33d ago
That’s not what I wrote. It’s just that in my dev work I only rarely have the need for one-off tools.
BeetleB · 33d ago
That used to be the case with me too before LLMs.
It was because writing one off tools took time and you needed it to do more for it to be worth the time.
Now a lot more are getting written because it takes a lot less effort. :-)
viraptor · 33d ago
It just depends on the environment. Some areas get to experiment with different approaches more than others. On one extreme if I was writing yet another crud app, it's unlikely is need any such tools. On the other, when dealing with data processing/visualisation/ml, it's small experimental tools all over the place.
sagarm · 33d ago
Do you not debug, optimize, analyze ...? Copilot is especially valuable for throwaway (or nearly throwaway) log parsing/analysis, for example.
layer8 · 32d ago
I do, but I have all the software I need for that. I may use LLMs in that context, but for the actual analysis, not for generating one-off tools.
airstrike · 33d ago
More like it's time to bring it back...
jiggawatts · 33d ago
I’ve used Gemini Pro 2.5 to generate code syntax rewrite tools from me. I asked it to use a compiler SDK that I know from experience is fiddly and frustrating to use. It gave me a working tool in about 10 minutes, while I was actively listening to a meeting.
Not all scripts are “ops”!
enos_feedler · 33d ago
What if every piece of software any consumer needed could be described this way? Outside of system code this could be everything we ever need. This world is nearly upon us and it is super exciting.
mattlondon · 32d ago
I frequently use it to assemble the boiler plate tedious stuff. E.g. add a new test file and wire up all the build rules etc, then implement tests x y z. Or the other day I had to create an input form in an Angular app and I just asked for the 7 or 8 fields I wanted and it did it and wired it all up for me (both template and Typescript). It can do those sorts of things in a few seconds and saves me maybe only 15-20 minutes but the mental relief of me not having to do it is great, even if the time saving is relatively low.
The other thing is sometimes just writing out method signatures with all the right types and conversions/casts etc between types/interfaces/classes etc when I can't be bothered to do all the look ups myself ("Create a method here called foo that accepts a Bar instance and converts it to Baz. Return type should be Quux - add a conversion from Baz to a new Quux instance before the final return - use the builder pattern and map the constant magic-strings in blah.ts to appropriate enum values in Quux." Etc etc and then I write the logic in the middle) Again not a huge time saving but it mentally lightens the load and keeps you concentrating on the problem rather than the minutia
teemur · 32d ago
If you can identify blocks of code you need to write that are easy to define reasonably well, easy to review/verify that it is written correctly but still burdensome to actually write, LLMs are your new best friend. I don't know about how other people think/write, but I seem to have a lot of that kind of stuff on my table. The difficult part to outsource to LLMs is how to connect these easy blocks, but luckily thats the part I find fun in coding, not so much writing the boring simple stuff.
IshKebab · 32d ago
In many cases yes. In many cases no. Overall, it can save you a lot of time even accounting for time wasted when you give it more than it can handle and you give up and do it all yourself.
tptacek · 33d ago
Yes, but the bar for skepticism is higher than that, because LLMs also compile code and catch errors, and generate and run tests; compile errors and assertion failures are just more prompts to an LLM agent.
lolinder · 33d ago
When used that way they also regularly get into loops where they lose track of what they were supposed to do.
The last time I set Cursor on something without watching it very very closely it spun for a while fixing tests and when it finally stopped and I looked what it had done it had coded special cases in to pass the specific failing tests in a way that didn't generalize at all to the actual problem. Another recent time I had to pull the plug on it installing a bunch of brand new dependencies that it decided would somehow fix the failing tests. It had some kind of complete rewrite planned.
Claude Code is even worse when it gets into this mode because it'll do something totally absurd like that and then at the end you have to `git reset` and you're also on the hook for the $5 of tokens that it managed to spend in 5 minutes.
I still find them useful, but it takes a lot of practice to figure out when they'll be useful and when they'll be a total waste of time.
tptacek · 33d ago
It happens to me every once in awhile, but I'm not sure why I would care. I usually set it off on some question and go tab away to something else while it flails. When I come back, I have a better-than-average shot at a workable solution, which is a science fiction result.
When I first began programming as a teenager, one of the mental hurdles I had to get over was asking the computer to "too much"; like, I would feel bad writing a nested loop --- that can't possibly be the right answer! What a chore for the computer! It didn't take me too long to figure out that was the whole point of computers. To me, it's the same thing with LLMs spinning on something. Who gives a shit? It's not me wasting that time.
lolinder · 33d ago
It can be a science fiction result and still not actually result in time saved for the human operator on the whole. For me the jury is definitely still out on whether it results in net time saved and it's not for lack of trying.
Whether it ends up getting good enough in the near future that it does become a net positive both isn't the question being discussed and still remains to be seen.
kasey_junk · 31d ago
Have you tried an asynchronous agent coding flow? They were relatively rare until last week. But it took that for me to see the value. Now I happily queue up 4 or 5 tasks in the morning, come back at lunch to check in/feedback or merge at lunch, rinse and repeat.
It doesn’t replace the hard tasks (yet) and you do need to think about the tasks and the tooling but it’s a game changer.
I wasn’t kidding in a peer comment (except about the mars cheese castle). I started an agent task before leaving on a trip and gave it feedback from my iPad when I stopped. I have a real business problem solved now.
kasey_junk · 33d ago
I’m now at the point where I tell it to do something _and then drive to Wisconsin_ giving feedback at the Marz cheese castle and continuing my trip.
tptacek · 33d ago
I feel like you might know where I'm coming from being confused at people's reaction to this stuff. This is science fiction. I think it's just not sinking in with people. If I could bet on this, I would bet everything I could on "skill with LLMs" being the high order bit of being an effective software developer 5 years from now.
mwcampbell · 32d ago
I don't deny that, under the right circumstances, these tools can produce results that feel indistinguishable from magic, or like science fiction as you put it. But I don't think it's worth the costs. To me, the two most concerning costs are the unreliability, and the massive amounts of stolen training data and underpaid labor (the RLHF process) required for these models. I'm not comfortable relying on a tool built on such foundations.
My bet, and I realize this might just be wishful thinking, is that the high order bit for being an effective software developer in the near future will be skill at using more reliable and non-exploitative automation tools, such as programming languages with powerful macro systems and other high-level abstractions, to stay competitive with developers who sling LLM-generated code. So I'd better get started developing that skill myself.
tptacek · 32d ago
I could not possibly have any less sympathy for an argument than I do about the "stolen IPR" implications of coding LLMs.
kasey_junk · 32d ago
3
d1sxeyes · 32d ago
To be fair, TDD has three steps: Red, Green, Refactor. Sounds like you got to Green. /s
It rewrote some comments, changed the test name and added extra assertions to the test. Baby sitting something like that seems like an absolute waste of time.
BeetleB · 33d ago
You want a citation for things so many people are doing daily with LLMs?
Just because they can't fix most failures doesn't mean they can't fix many.
what · 33d ago
It’s a failure that it created and when told to fix it did this. It’s beyond bad.
BeetleB · 32d ago
> It's a failure that it created and when told to fix it did this. It’s beyond bad.
No one's disputing this was bad. People are merely claiming it can also be good. I've dealt with plenty of humans this bad - it's not an argument that humans can't program.
callc · 32d ago
It seems like the underlying issue is trust.competent programmer - even a junior - and trust them to finish the task correctly. It might take multiple tries, and they may ask for clarification, but since they’re human, we trust they are intelligent.
There are some people who fall into the bucket that we can’t trust them to finish the task correctly, or within a time frame or level of effort on our part to make the task offloading exercise have a positive benefit.
If we view LLMs in the same light, IMO currently they fall into “not trust” category to really give they a task and trust them to finish it correctly, with us being confident we don’t really need to understand their implementation.
If one day LLMs or some other solution reaches that point, then it definitely won’t look like a bubble, but a real revolution.
BeetleB · 32d ago
Very well put. The trick is to do either of the following:
1. Find simpler tasks for which the trust in LLMs is high.
2. Give tasks to the LLMs that have a very low cost to verify (even when the task is not simple) - particularly one off scripts.
I once had a colleague who was in the "not trust" bucket for the work we were doing. So we found something he was good at that was a pain for me to do, and re-assigned him to do those things and take that burden off of us.
In the last few months I've had the LLM solve (simple) problems via code that had been in my head for years. At any point I could have done them, but they were a chore. If the LLM failed for one of these tasks - it's not a big deal - not much time was lost. But they tend to succeed fairly often, because they are simple tasks.
I almost never let the LLM write production code, because of the extra burden that you and others allude to. But I do let it write code I rely on in my personal life, because frankly I tend to write pretty poor code for my personal use - I can't justify the time it would take to write things well - life is too busy. I welcome the code quality I get from Sonnet or Gemini 2.5 Pro.
That's my point in this thread. Writing code is a pretty diverse discipline, and many are dismissing it simply because it doesn't do one particular use case (high quality production code) well.
I didn't take LLM coding seriously until I found well respected, well known SW engineers speak positively about them. Then I tried it and ... oh wow. People dismissing them is dismissing not only a lot of average developers' reality, but also a lot of experts' daily reality.
He used an LLM to find a security vulnerability in the kernel. To quote him:
> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.
motorest · 32d ago
> It rewrote some comments, changed the test name and added extra assertions to the test. Baby sitting something like that seems like an absolute waste of time.
I've worked with real flesh and blood developers who did the exactly same thing. At least with LLMs we don't have to jump into a 1h long call to discuss the changes.
jsosshbfn · 32d ago
LLMs tend to type a little faster than humans... so for straightforward code, yes?
simonw · 33d ago
Depends how good you are at reading and reviewing code. If you're already good at that LLMs are a huge productivity boost.
pmbanugo · 33d ago
I've been experimenting with LLMs for coding for the past year - some wins, plenty of frustrations. Instead of writing another "AI will change everything" post, I collected practical insights from other senior engineers who've figured out what actually works. No hype, just real experiences from people in the trenches.
easygenes · 32d ago
I think none of these offer much useful insight beyond the overarching idea of peer programming beating just vibe coding.
The best structure I've found which leverages this idea is called BMAD, and treats the LLM as though it were a whole development team in an orchestrated way that you have full control over.
Looks like an elevated vibe coding method for UI development. Does this work for non-web/UI development?
easygenes · 32d ago
You're always limited by the knowledge gaps of the underlying LLM. The method is otherwise the most coherent way to work to the strengths of the LLM through disciplined context and role management. Nothing about it is UI focused, and leans more on general agile team structures than anything else.
jbellis · 33d ago
I would have said that Harper Reed's workflow (brainstorm spec, then co-plan a plan, then execute using LLM codegen) is basically best practice today and I'm surprised that the author adds that "I’ve not been successful using this technique to build a complete feature or prototype."
This is showing the workflow of your tool quite well, but would be way more convincing & impressive if you had actually fixed the bug and linked to the merged PR.
jbellis · 31d ago
that's fair! we thought that at 23m it was already pushing attention spans pretty hard :)
> AI is much better than strong engineers at writing very short programs: in particular, it can produce ten to thirty lines of straightforward mostly-working code faster than any engineer.
> How can you leverage this? There’s not much demand for this kind of program in the day-to-day of a normal software engineer. Usually code either has to be a modification to a large program, or occasionally a short production-data script (such as a data backfill) where accuracy matters a lot more than speed.
While this may be technically correct — there’s little demand for standalone small programs — it overlooks a crucial reality: the demand for small code segments within larger workflows is enormous.
Software development (in my experience) is built around composing small units — helpers, glue code, input validation, test cases, config wrappers, etc. These aren’t standalone programs, but they’re written constantly. And they’re exactly the kind of 10–30 line tasks where LLMs are most effective.
Engineers who break down large tasks into AI-assisted microtasks can move faster. It’s not about replacing developers — it’s about amplifying them.
diggan · 33d ago
> Peer Programming with LLMs, For Senior+ Engineers
> [...] a collection of blog posts written by other senior or staff+ engineers exploring the use of LLM in their work
It seems to be by senior engineers if anything, I don't see anything in the linked articles indicating they're for senior engineers, seems programmers of all seniority could find them useful, if they find LLMs useful.
OutOfHere · 33d ago
Yes, although those who are not senior engineers will not preemptively see the value in the documented approaches. One has to be a senior to preemptively appreciate the value in them.
SoftTalker · 33d ago
Though I haven't tried it, I would probably enjoy peer programming with an LLM more than I do with a real person (which I have tried and hated).
I could assign the LLM the simple drudgery that I don't really want to do, such as writing tests, without feeling bad about it.
I could tell the LLM "that's the stupidest fucking thing I've ever seen" whereas I would not say that to a real person.
gadflyinyoureye · 33d ago
It seems like we need to use forceful language with these things now. I've had copilot censor everything I asked it. Finally I had to to say, "listen you cracked up piece of shit, help me generate a uuid matcher. "
Aeolun · 32d ago
We’ve blocked your response because it matches public code.
Aeolun · 32d ago
I really want the LLM to do the opposite. To tell me that’s the stupidest fucking thing it’s ever seen. They’re surprisingly bad at that though.
pmbanugo · 33d ago
That’s what the recent Copilot feature on GitHub can do. You assign it tasks and it comes back with a PR. You could also assign it to review a PR.
tracerbulletx · 33d ago
My main feeling is that its great as long as I constrain it to working in a conceptual boundary that I can reason about, like a single system component where I am telling it the API. That way each piece that gets built up I have an understanding of it. If you try to let it go to wide it starts to make mistakes and I lose my mental model.
JSR_FDED · 32d ago
Well put. That’s my challenge too - losing the mental model of my entire codebase. Sometimes it feels like the time I saved using an LLM I then give right back when reassembling the mental model.
CompoundEyes · 32d ago
I write a lot of “defensive” C# code in my day job expecting that someone very inexperienced / offshore will be working with it in the future (and I will be reviewing it four months later when no longer on the project). I call it “corporate coding”. Lots of interfaces that must be adhered to, ioc, injection and annoyingly strong patterns. Anything that makes going off the rails a lot of work — the path of most resistance — glaring in code reviews. But…key logic concentrated in a few taller files so none of the drilling through abstraction so easy to comprehend for a newbie. I want to take some time with a defensive coding approach and LLMs. Particularly scoping it to a certain project or folder in a layered architecture. Why let it know of the front end, back end, database all at once? Of course it’ll get discombobulated.
I’ve also been experimenting with giving an LLM coins and a budget. “You have 10 coins to spend doing x, you earn coins if you m,n,o and lose coins if you j,k,l” this has reduced slop and increased succinctness. It will come back, recount what it’s done explaining the economy and spending. I’ve had it ask “All done boss I have 2 left how can i earn some more coins?” It’s fun to spy on the thinking model working through the choices “if I do this it’ll cost me this so maybe I should do this instead in 1 line of code and I’ll earn 3 coins!”
ColinEberhardt · 32d ago
Thanks for sharing pmabanugo, a couple of those posts are new to me too. If you’re taking submissions, I’ve been exploring how to make the most of these tools for the past few months, here’s my latest post:
I want to note that the headlines gave me an idea for a nonprofit: "Peer Programming with LLM's for Seniors."
Somebody jump on that. It's yours. :)
pmbanugo · 33d ago
re-reading the title makes me feel like I used a wrong title.
Could be a good idea for a non-profit like you said. I know someone who’s exploring something similar but for disabled folks who aren’t tech-savvy (for-profit)
nickpsecurity · 33d ago
That's kind of them. I'll pray their effort succeeds.
westurner · 33d ago
What are some of the differences between Peer Programming with LLMs and Vibe Coding?
diggan · 33d ago
> What are some of the differences between Peer Programming with LLMs and Vibe Coding?
"Vibe Coding" is specifically using the LLM instead of programming anything, barely caring about the output. If something is wrong, don't even open the file, just ask the LLM. Basically "prompting while blindfolded" I guess you could say.
Peer programming with an LLM would be to use it as another tool in the toolbox. You still own your program and your code. Edit away, let the LLM do some parts that are either too tricky, or too trite to implement, or anything in-between. Prompts usually are more specific, like "Seems X is broken, look into Y and figure out if Z could be the reason".
tptacek · 33d ago
I think the consensus boils down to: you're vibe coding if you don't understand the code before you merge it.
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. (...) I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. (...)
Pair programming still very much deals with code and decisions.
westurner · 32d ago
So, pair programming continues to emphasize software quality (especially with LLMs) but "vibe coding" is more of a "whoo, I'm a reckless magician" (in a less risky application domain) sort of thing?
But doesn't a 'vibe-coding' "we'll just sort out the engineering challenges later" ensure that there will be re-work and thus less overall efficiency?
lowbloodsugar · 33d ago
I would say that the difference is taking an engineering approach to the process itself. Iterating on the context, putting the system into various states, etc. Treating the AI like a very knowledgeable intern who also has a very fixed short term memory and can’t form new long term memories but can be taught to write things down like in Memento. The thing is, though, it has a much much larger short term memory than me.
westurner · 31d ago
This is probably important to demo in teaching coding with AI.
I suppose the modern difference is the degree of human validation before committing or releasing.
dietr1ch · 33d ago
(Site is unreadable for me on Firefox 138, but the text is still there if you select all. Qutebrowser based on Chromium 130 doesn't render it either.)
vvillena · 33d ago
No problems here, both the normal view and reader mode seem to work well.
There is also the problem that none of these workflows were validated or verified. Everyone is free to go on social media or personal blogs and advertise their snake oil. Thus in a scenario where these workflows are found to be lacking, the perceived staleness might actually be ineffectiveness beyond self promotion.
You don't even need to switch models. Write a prompt to generate some code and immediately after prompt the same model to review the code it just generated. Sometimes it takes 3 or 4 prompts to get the result to converge. But converge to where?
I dont use it much to generate code, I ask it higher level questions more often. Like when I need a math formula.
I've also tried asking the LLM to come up with a proposed solution while I work on my own implementation at the same time.
LLMs can also be much faster if a task requires some repetitive work. When I recognize a task like that, I try coding the first version and then ask the LLM to follow my pattern for the other areas where I need to repeat the work.
I bear some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.
I also offer you an old saying we have all heard many times:
There are 2 kinds of languages: ones that everyone complains about and ones that nobody uses.
Flashback to when I committed a suite of tests in Python that were indented one tab too much, resulting in them not running at all. This passed code review (on a FAANG company) and was discovered months later from an unrelated bug. The point is even unit tests have a very human element to them.
In 2010s the move was towards more concise languages and programming techniques in e.g. JavaScript scene too. CoffeeScript is a prime example of this.
But then came the enterprise software people pushing their Javaisms and now we have verbose bondage and ceremony mess like TypeScript and ES6 modules.
And in a tragicomic turn after we made expressing programmer intent formally so difficult, we are turning into writing bad novels for LLMs in a crapshoot trial and error hoping it writes the pointless boilerplate correctly.
That’s why I really like copilot agent and codex right now.
Even more parallel stuff and from my phone when I’m just thinking of ideas.
It often is, if you pick the right tasks (and more tasks fall into that bucket every few weeks).
You can get a simple but fully-working app out of a single prompt, though quality varies widely unless you’re very specific.
Once you have a codebase, agent output quality comes down to architecture and tests.
If you have a scalable architecture with well-separated concerns, a solid integration test harness with examples, and good documentation (features, stack, procedures, design constraints), then getting the exact change you want is a matter of how well you can articulate what you want.
One more asterisk, the development environment has to support the agent: like a human, agents work well with compiler feedback, and better with testing tools and documentation/internet access (yes my agents have these).
I use CheepCode to work on itself, but I am still building up the test library and preview environments to de-risk merging non-trivial PRs that I haven’t pulled down and run locally. I also use it to work on other apps that I'm building, and since those are far more self-contained / easier to test, I get much better results there.
If you want to put less effort into describing what you want, have a chat with an AI to generate tickets. Then paste those tickets into Linear and let CheepCode agents rip through them. I’ve got tooling in the works that will make that much easier, but I can only be in so many places at once as a bootstrapped founder :-)
The answer is always "it depends". There are some drudge work tasks that are brilliantly done by LLMs, such as generating unit tests or documentation. Often the first attempt is not great, but iterating over them is so fast that you can regenerate everything from scratch a dozen times before you spend as much time as you would do if you wrote them yourself.
It also depends on what scope you're working on. Small iterations have better results than grand redesigns.
Context is also critical. if your codebase is neatly organized with squeaky clean code then LLMs generate better recommendations. If your codebase is a mess with inconsistent styles in spaghetti code then your prompts tend to generate more of the same.
It was because writing one off tools took time and you needed it to do more for it to be worth the time.
Now a lot more are getting written because it takes a lot less effort. :-)
Not all scripts are “ops”!
The other thing is sometimes just writing out method signatures with all the right types and conversions/casts etc between types/interfaces/classes etc when I can't be bothered to do all the look ups myself ("Create a method here called foo that accepts a Bar instance and converts it to Baz. Return type should be Quux - add a conversion from Baz to a new Quux instance before the final return - use the builder pattern and map the constant magic-strings in blah.ts to appropriate enum values in Quux." Etc etc and then I write the logic in the middle) Again not a huge time saving but it mentally lightens the load and keeps you concentrating on the problem rather than the minutia
The last time I set Cursor on something without watching it very very closely it spun for a while fixing tests and when it finally stopped and I looked what it had done it had coded special cases in to pass the specific failing tests in a way that didn't generalize at all to the actual problem. Another recent time I had to pull the plug on it installing a bunch of brand new dependencies that it decided would somehow fix the failing tests. It had some kind of complete rewrite planned.
Claude Code is even worse when it gets into this mode because it'll do something totally absurd like that and then at the end you have to `git reset` and you're also on the hook for the $5 of tokens that it managed to spend in 5 minutes.
I still find them useful, but it takes a lot of practice to figure out when they'll be useful and when they'll be a total waste of time.
When I first began programming as a teenager, one of the mental hurdles I had to get over was asking the computer to "too much"; like, I would feel bad writing a nested loop --- that can't possibly be the right answer! What a chore for the computer! It didn't take me too long to figure out that was the whole point of computers. To me, it's the same thing with LLMs spinning on something. Who gives a shit? It's not me wasting that time.
Whether it ends up getting good enough in the near future that it does become a net positive both isn't the question being discussed and still remains to be seen.
It doesn’t replace the hard tasks (yet) and you do need to think about the tasks and the tooling but it’s a game changer.
I wasn’t kidding in a peer comment (except about the mars cheese castle). I started an agent task before leaving on a trip and gave it feedback from my iPad when I stopped. I have a real business problem solved now.
My bet, and I realize this might just be wishful thinking, is that the high order bit for being an effective software developer in the near future will be skill at using more reliable and non-exploitative automation tools, such as programming languages with powerful macro systems and other high-level abstractions, to stay competitive with developers who sling LLM-generated code. So I'd better get started developing that skill myself.
It rewrote some comments, changed the test name and added extra assertions to the test. Baby sitting something like that seems like an absolute waste of time.
Just because they can't fix most failures doesn't mean they can't fix many.
No one's disputing this was bad. People are merely claiming it can also be good. I've dealt with plenty of humans this bad - it's not an argument that humans can't program.
There are some people who fall into the bucket that we can’t trust them to finish the task correctly, or within a time frame or level of effort on our part to make the task offloading exercise have a positive benefit.
If we view LLMs in the same light, IMO currently they fall into “not trust” category to really give they a task and trust them to finish it correctly, with us being confident we don’t really need to understand their implementation.
If one day LLMs or some other solution reaches that point, then it definitely won’t look like a bubble, but a real revolution.
1. Find simpler tasks for which the trust in LLMs is high.
2. Give tasks to the LLMs that have a very low cost to verify (even when the task is not simple) - particularly one off scripts.
I once had a colleague who was in the "not trust" bucket for the work we were doing. So we found something he was good at that was a pain for me to do, and re-assigned him to do those things and take that burden off of us.
In the last few months I've had the LLM solve (simple) problems via code that had been in my head for years. At any point I could have done them, but they were a chore. If the LLM failed for one of these tasks - it's not a big deal - not much time was lost. But they tend to succeed fairly often, because they are simple tasks.
I almost never let the LLM write production code, because of the extra burden that you and others allude to. But I do let it write code I rely on in my personal life, because frankly I tend to write pretty poor code for my personal use - I can't justify the time it would take to write things well - life is too busy. I welcome the code quality I get from Sonnet or Gemini 2.5 Pro.
That's my point in this thread. Writing code is a pretty diverse discipline, and many are dismissing it simply because it doesn't do one particular use case (high quality production code) well.
I didn't take LLM coding seriously until I found well respected, well known SW engineers speak positively about them. Then I tried it and ... oh wow. People dismissing them is dismissing not only a lot of average developers' reality, but also a lot of experts' daily reality.
Just look at the other submission:
https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
He used an LLM to find a security vulnerability in the kernel. To quote him:
> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.
I've worked with real flesh and blood developers who did the exactly same thing. At least with LLMs we don't have to jump into a 1h long call to discuss the changes.
The best structure I've found which leverages this idea is called BMAD, and treats the LLM as though it were a whole development team in an orchestrated way that you have full control over.
https://youtu.be/E_QJ8j74U_0 https://github.com/bmadcode/BMAD-METHOD
Here's an example of using this pattern with Brokk to solve a real world bug: https://www.youtube.com/watch?v=t_7MqowT638
> AI is much better than strong engineers at writing very short programs: in particular, it can produce ten to thirty lines of straightforward mostly-working code faster than any engineer.
> How can you leverage this? There’s not much demand for this kind of program in the day-to-day of a normal software engineer. Usually code either has to be a modification to a large program, or occasionally a short production-data script (such as a data backfill) where accuracy matters a lot more than speed.
While this may be technically correct — there’s little demand for standalone small programs — it overlooks a crucial reality: the demand for small code segments within larger workflows is enormous.
Software development (in my experience) is built around composing small units — helpers, glue code, input validation, test cases, config wrappers, etc. These aren’t standalone programs, but they’re written constantly. And they’re exactly the kind of 10–30 line tasks where LLMs are most effective.
Engineers who break down large tasks into AI-assisted microtasks can move faster. It’s not about replacing developers — it’s about amplifying them.
> [...] a collection of blog posts written by other senior or staff+ engineers exploring the use of LLM in their work
It seems to be by senior engineers if anything, I don't see anything in the linked articles indicating they're for senior engineers, seems programmers of all seniority could find them useful, if they find LLMs useful.
I could assign the LLM the simple drudgery that I don't really want to do, such as writing tests, without feeling bad about it.
I could tell the LLM "that's the stupidest fucking thing I've ever seen" whereas I would not say that to a real person.
I’ve also been experimenting with giving an LLM coins and a budget. “You have 10 coins to spend doing x, you earn coins if you m,n,o and lose coins if you j,k,l” this has reduced slop and increased succinctness. It will come back, recount what it’s done explaining the economy and spending. I’ve had it ask “All done boss I have 2 left how can i earn some more coins?” It’s fun to spy on the thinking model working through the choices “if I do this it’ll cost me this so maybe I should do this instead in 1 line of code and I’ll earn 3 coins!”
https://blog.scottlogic.com/2025/05/08/new-tools-new-flow-th...
Somebody jump on that. It's yours. :)
Could be a good idea for a non-profit like you said. I know someone who’s exploring something similar but for disabled folks who aren’t tech-savvy (for-profit)
"Vibe Coding" is specifically using the LLM instead of programming anything, barely caring about the output. If something is wrong, don't even open the file, just ask the LLM. Basically "prompting while blindfolded" I guess you could say.
Peer programming with an LLM would be to use it as another tool in the toolbox. You still own your program and your code. Edit away, let the LLM do some parts that are either too tricky, or too trite to implement, or anything in-between. Prompts usually are more specific, like "Seems X is broken, look into Y and figure out if Z could be the reason".
No comments yet
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. (...) I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. (...)
Pair programming still very much deals with code and decisions.
But doesn't a 'vibe-coding' "we'll just sort out the engineering challenges later" ensure that there will be re-work and thus less overall efficiency?
I suppose the modern difference is the degree of human validation before committing or releasing.