Vibe coding as a coding veteran: from 8-bit assembly to English-as-code

179 thunderbong 117 8/28/2025, 3:55:15 PM levelup.gitconnected.com ↗

Comments (117)

stavros · 2d ago

I've come to view LLMs as a consulting firm where, for each request, I have a 50% chance of getting either an expert or an intern writing my code, and there's no way to tell which.

Sometimes I accept this, and I vibe-code, when I don't care about the result. When I do care about the result, I have to read every line myself. Since reading code is harder than writing it, this takes longer, but LLMs have made me too lazy to write code now, so that's probably the only alternative that works.

I have to say, though, the best thing I've tried is Cursor's autocomplete, which writes 3-4 lines for you. That way, I can easily verify that the code does what I want, while still reaping the benefit of not having to look up all the APIs and function signatures.

kaptainscarlet · 2d ago

I've also had a similar experience. I have become too lazy since I started vibe-coding. My coding has transitioned from coder to code reviewer/fixer vey quickly. Overall I feel like it's a good thing because the last few years of my life has been a repetition of frontend components and api endpoints, which to me has become too monotonous so I am happy to have AI take over that grunt work while I supervise.

latexr · 1d ago

> My coding has transitioned from coder to code reviewer/fixer vey quickly. Overall I feel like it's a good thing

Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.

> the last few years of my life has been a repetition of frontend components and api endpoints, which to me has become too monotonous

It’s a surprise that so many people have this problem/complaint. Why don’t you use a snippet manager?! It’s lightweight, simple, fast, predictable, offline, and includes the best version of what you learned. We’ve had the technology for many many years.

TuringTest · 3h ago

> Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.

You can locally run pretty decent coding models such as Qwen3 Coder in a RTX 4090 GPU through LM Studio or Ollama with Cline.

It's a good idea even if they give slightly worse results in average, as you can limit your spending of expensive tokens for trivial grunt work and use them only for the really hard questions where Claude or ChatGPT 5 will excel.

onion2k · 1d ago

Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.

Devs shouldn't be blindly accepting the output of an LLM. They should always be reviewing it, and only committing the code that they're happy to be accountable for. Consequently your coding and syntax knowledge can't really atrophy like that.

Algorithms and data structures on the other hand...

the_real_cher · 1d ago

> Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.

I never remembered those keywords to begin with.

Checkmate!

realharo · 1d ago

>Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.

Realistically, that's probably never going to happen. Expecting it is just like the prepper mindset.

DaSHacka · 1d ago

I imagine this is what everyone says about all SaaS services in their "burn investor money to acquire more users" phase before hitting the "enshittify and charge more for the service to become profitable" phase

stavros · 2d ago

Yeah, exactly the same for me. It's tiring writing the same CRUD endpoints a thousand times, but that's how useful products are made.

foolserrandboy · 1d ago

I wonder why it’s not the norm to use code generation or some other form of meta programming to handle this boring repetitive work?

Cthulhu_ · 1d ago

Because in practice the API endpoint isn't what takes up the time or LOC, but what's underneath. In fact, there's plenty of solutions to e.g. expose your database / data storage through an API directly. But that's rarely what you really want.

stavros · 1d ago

Because, like a carpenter doesn't always make the same table, but can be tired of always making tables, I don't always write the exact same CRUD endpoints, but am tired of always writing CRUD endpoints.

js8 · 1d ago

I think your analogy shows why LLMs are useful, despite being kinda bad. We need some programming tool to which we can say, "like this CRUD endpoint, but different in this and that". Our other metaprogramming tools cannot do that, but LLMs kinda can.

I think now we have identified this problem (programmers need more abstract metaprogramming tools) and a sort of practical engineering solution (train LLM on code), it's time for researchers (in the nascent field of metaprogramming, aka applied logic) to recognize this and create some useful theories, that will help to guide this.

In my opinion, it should lead to adoption of richer (more modal and more fuzzy) logics in metaprogramming (aside from just typed lambda calculus on which our current programming languages are based). That way, we will be able to express and handle uncertainty (e.g. have a model of what constitutes a CRUD endpoint in an application) in a controlled and consistent way.

This is similar how programming is evolving from imperative with crude types into something more declarative with richer types. (Roughly, types are the specification and the code is the solution.) With a good set of fuzzy type primitives, it would be possible to define a type of "CRUD endpoint", and then answer the question if the given program has that type.

iterateoften · 1d ago

Leaky abstractions. Lots of meta programming frameworks tried to do this over the years (take out as much crud as possible) but it always ends up that there is some edge case your unique program needs that isn’t handled and then it is a mess to try to hack the meta programming aspects to add what you need. Think of all the hundreds of frameworks that try to add an automatic REST API to a database table, but then you need permissions, domain specific logic, special views, etc, etc. and it ends up just easier to write it yourself.

If you can imagine an evolutionary function of noabstraction -> total abstraction oscilating overtime, the current batch of frameworks like Django and others are roughly the local maxima that was settled on. Enough to do what you need, but doesn’t do too much so its easy to customize to your use case.

therein · 2d ago

The lazy reluctance you feel is atrophy in the making. LLMs induce that.

kaptainscarlet · 1d ago

That's my biggest worry, atrophy. But I will cross that bridge when I get to it.

latexr · 1d ago

With atrophy, by the time you get to the bridge you’ll realise it’s too deteriorated to cross and will have to spend a lot of time rebuilding and reinforcing it before you can get to the other side.

Cthulhu_ · 1d ago

That's it, every line of code is an implicit requirement, based on explicit requirements; when you have a codebase that needs to be maintained or replaced, as a developer it's your job to determine which of the implicit requirements in code are explicit requirements for what the application does.

I do think that in a few years time, next generation coding LLMs will read current-generation LLM generated code to improve on it. The question is whether they're smart enough to ignore the implicit requirements in the code if they aren't necessary for the explicit ones.

(this comment makes sense in my head)

Most if not all of my professional projects have been replacing existing software. In theory, they're like-for-like, feature-for-feature rewrites. In practice, there's an MVP of must-have features which usually is only a fraction of the features (implicit or explicit) of the application it replaced, with the rewrite being used as an opportunity to re-assess what is actually needed, what is bloat over time, and of course to do a redesign and re-architecture of the application.

That is, rewriting software was an exercise in extracting explicit features from an application.

talles · 2d ago

> Since reading code is harder than writing it,

Reading bad code is harder than writing bad code. Reading good code is easier than writing good code.

stavros · 2d ago

I beg to differ.

jagged-chisel · 2d ago

No need to beg. Everyone’s got their opinion. I just wish, this being Hacker News, that more people would articulate their different opinions instead of just stopping with “I disagree.”

stavros · 2d ago

Well, my first comment said "reading code is harder than writing code", your comment said "reading good code is easier than writing good code". I believe the two points are about equally articulated.

jagged-chisel · 1d ago

Neither comment is mine. I’m here in the outside wanting to understand the arguments you have in your heads. Sure, the two comments you mention are equally under-articulated. Either continue the discussion for the benefit of others on the site, or leave it as it stands. Stating “I beg to differ” is pointless.

stavros · 1d ago

This comment put it well: https://news.ycombinator.com/item?id=45089978

fnordpiglet · 1d ago

This is the sign of seniority IMO. First you learn to write code. Then you learn to write code that can be read. Then you learn to modify code. Then you learn to read other people’s code. Then you learn to modify other people’s code. Then you learn to own code regardless of who reads or writes it.

At this point in my career 35 years in I find reading and writing code whether I wrote it or other did irrelevant. Bad or good code, it’s all the same. By far the most effective work I do involves reading a lot of complex code written by many people over many years and seeing the exact one line to change or improve.

I find LLM assisted coding very similar frankly. I’ve finished maybe 20 projects or more on the last seven months on my own time that I never would have been able to do in my lifetime for want of free time to learn minutia in stuff I am not familiar with. The parts it get hung up on I’m able with quick inspection to recognize and unwedge it, just like any junior engineer. The junior engineers also are often much better versed in XYZ library than I am.

theshrike79 · 1d ago

This is the thing.

LLM assisted coding ("vibe coding") is just project management.

You ask it to do things, then you check the work to a sufficient degree.

The better the specifications and documentation you give it, the better the result will be. Keeping tasks short and verifiable also helps a lot.

I've written SO many small tools for myself during the last year it's not even funny. Upgraded some shitty late night Python scripts to proper Go applications with unit tests and all, while catching up on my TV shows.

Converted my whole rat's nest of Docker compose files to a single Opentofu declarative setup.

None of this would've gotten done without an LLM assistant.

fnordpiglet · 14h ago

Funny I end up working on 5-6 things at once that are fairly varied. My favorite rats nest is rebuilding by DIY NAS as a NixOS declaration so I can rebuild the filer root from GitHub.

ontouchstart · 1d ago

I’m at the same point as well. Doing more reading than writing.

Just want to add one more point: code is not fiction or even nonfiction, “good or bad” style can be subjective, but correct or not correct is not, regardless of the reviewer’s mental model.

The difficulty of reading code is to understand its logic and logical consequences. The more complex the codebase (not just the line we are reading), the more risky to modify it.

That is why I use LLM to write a lot of tests and let it review the logs to help me understand the logic. Even the tests can be disposable.

AlexCoventry · 1d ago

Why is reading code harder than writing it?

blackoil · 1d ago

I think it has to do with mental model. If you already know what to write and it is reasonably complex you'll have a mental model ready and can quickly write it down (now even faster as LLMs autocomplete 3-4 lines at a time). While reading someone else code you'll have to constantly map the code in your mind with code written and have to then compare quality, security and other issues.

stavros · 1d ago

Yeah, it's exactly this. Having to create a mental model from the code is much harder than having one and just writing it out.

AlexCoventry · 23h ago

I just tend to find LLM code output extremely to read, I guess. It tends to be verbose and do a lot of unnecessary stuff, but I can always get the point easily and edit accordingly.

theshrike79 · 1d ago

I'd say just reading your own code from a few years back will be as hard as reading someone else's.

_fat_santa · 1d ago

> I have a 50% chance of getting either an expert or an intern writing my code

The way I describe it is almost gambling with your time. Every time I want to reach for the Cline extension in VSCode, I always ask myself "if this gamble worth it?" and "what are my odds for this gamble?".

For some things like simple refactoring I'm usually getting great odds so I use AI, but I would say at least 5-6 times last week I've thought about it and ended up doing it by hand as the odds were not in my favor.

One thing I've picked up using AI over the past few months is this sense of what it can and can't do. For some things I'm like "yeah it can do this no problem" but for other tasks I find myself going "better do this by hand, AI will just fuck it up"

lukaslalinsky · 1d ago

There is a way of doing things that sits between autocomplete and vibe coding. If you use these tools effectively, you learn how to prop them up with context, so make sure they won't start imaging how things should work, then you make it plan a solution, and if you have the time, you watch it implement it and approve as you go. Sometimes you stop it, make correction, and you go on. While it's doing its thing, you can plan the next task. Sometimes I do it in larger chunks, where I auto-accept changes/tool uses, but even in that, I only allow it to do one small task as a time, something that takes me minutes to review.

fibonachos · 1d ago

The multi-line autocomplete feels like the right middle ground (for me) when working in an existing code base with well established patterns. Adding new functionality is mostly a matter of scaffolding, commenting as I go, typing the first few characters of a code block and tabbing my way through the bulk of the code.

MangoCoffee · 2d ago

>When I do care about the result, I have to read every line myself.

isn't that the same as delegated task to jr developer but you still have to check their work as sr?

stavros · 2d ago

It is, but not the same as if a senior developer were writing it. I would feel much less like I have to check it then.

ryandrake · 2d ago

I still have doubts but I am warming up to Claude Code. Sometimes it gets in these ruts where it tries something wrong, gets shown it's wrong, keeps digging, and can't even explain how its failed suggestions could even work. Other times, it seems to reproduce the kinds of insights (and even judgment) that I'd expect out of an actual practitioner. After suggesting code to fix problems, I always ask it to explain the context in more detail. I was recently using it help to develop a Qt C++ application, and ended up getting this interaction from it:

> Can you speculate on why this problem was only happening with these embedded buttons? Is this a flaw in Qt, or is this application trying to do something unusual?

  Why this problem occurred: This is a Qt design issue, not a flaw per se. Qt's item delegate editing model assumes editors are simple widgets (line edits, spin boxes, combo boxes) that don't spawn modal dialogs. When a modal dialog opens:

  1. The modal dialog steals focus from the editor button
  2. Qt's QAbstractItemView sees the FocusOut event and assumes editing is complete
  3. It calls QStyledItemDelegate::destroyEditor() to clean up the editor
  4. The button gets deleted while the color dialog is still running
  5. When the color dialog closes, it tries to interact with the deleted button

  This application is doing something unusual - using a button that spawns a modal dialog as a table cell editor. Most Qt delegates use immediate editing widgets (text fields, dropdowns) that don't require modal interactions.

  The eventFilter solution works by intercepting the FocusOut event that triggers editor destruction, essentially telling Qt "I'm still editing, don't destroy me yet."

> Is there a danger of leaked Qt resources now that we are blocking destruction?

  Yes, that's a valid concern. By blocking FocusOut events, we're preventing Qt's normal editor cleanup mechanism. However, the risk is likely minimal because:

  1. Qt still cleans up eventually - The editor will be destroyed when the table view itself is destroyed or when editing truly ends (e.g., clicking elsewhere, pressing Enter/Escape)
  2. Short-lived editors - These editor widgets are typically very short-lived (created when editing starts, destroyed when editing ends)
  3. Parent widget cleanup - Since the button has a parent widget, Qt's parent-child ownership will clean it up when the parent is destroyed

  To be completely safe, we could add a more targeted approach - only block FocusOut events while a color dialog is actively open, rather than blocking all FocusOut events. We could track when the modal dialog is shown/hidden and only filter during that window.

  But for this use case, the current solution should be fine since:
  - Color picking is infrequent
  - The editor lifetime is still bounded by the table view's normal lifecycle
  - Qt's parent-child cleanup provides a safety net

  The memory usage impact should be negligible compared to the complexity of implementing more sophisticated tracking.

-----

I've definitely had less-satisfying discussions over bugs with actual human senior software engineers than this.

manwe150 · 1d ago

It seems to be just playing the “yes and” improv game with you. You might want to also try prompting it against the same suggestions and see if it changes to follow your lead or continues to hold the original opinion.

theshrike79 · 1d ago

But was it wrong?

mycentstoo · 2d ago

I believe choosing a well known problem space in a well known language certainly influenced a lot of the behavior. AIs usefulness is correlated strongly with its training data and there’s no doubt been a significant amount of data about both the problem space and Python.

I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.

It was a great read regardless!

dazzawazza · 1d ago

I think you are correct. I work in game dev. Almost all code is in C/C++ (with some in Python and C#).

LLMs are nothing more than rubber ducking in game dev. The code they generate is often useful as a starting point or to lighten the mood because it's so bad you get a laugh. Beyond that it's broadly useless.

I put this down to the relatively small number of people who work in game dev resulting in relatively small number of blogs from which to "learn" game dev.

Game Dev is a conservative industry with a lot of magic sauce hidden inside companies for VERY good reasons.

Lerc · 1d ago

One of my test queries for AI models is to ask it for an 8 bit asm function to do something that was invented recently enough that there is unlikely to be an implementation yet.

Multiplying two 24 bit posits in 8-bit Avr for instance. No models have succeeded yet, but usually because they try and put more than 8 bits into a register. Algorithmically it seems like they are on the right track but they don't seem to be able to hold the idea that registers are only 8-bits through the entirety of their response.

bugglebeetle · 1d ago

Do you provide this context or just ask the model to one-shot the problem?

Lerc · 1d ago

A clear description of the problem, but one-shot.

Something along the lines of

Can you generate 8-bit AVR assembly code to multiply two 24 bit posit numbers

You get some pretty funny results from the models that have no idea what a posit is. It's usually pretty clear to tell if they know what they are supposed to be doing. I haven't had a success yet (haven't tried for a while though). Some of them have come pretty close, but usually it's the trying to squeeze more than 8 bits of data into a register is what brings them down.

bugglebeetle · 1d ago

Yeah, so it’d be interesting to see if provided the correct context/your understanding of its error pattern, it can accomplish this.

One thing you learn quickly about working with LLMs if they have these kind of baked-in biases, some of which are very fixed and tied to their very limited ability to engage in novel reasoning (cc François Chollet), while others are far more loosely held/correctable. If it sticks with the errant patten, even when provided the proper context, it probably isn’t something an off-the-shelf model can handle.

Insanity · 2d ago

100% this. I tried haskelling with LLMs and it’s performance is worse compared to Go.

Although in fairness this was a year ago on GPT 3.5 IIRC

diggan · 2d ago

> Although in fairness this was a year ago on GPT 3.5 IIRC

GPT3.5 was impressive at the time, but today's SOTA (like GPT 5 Pro) are almost night-and-difference both in terms of just producing better code for wider range of languages (I mostly do Rust and Clojure, handles those fine now, was awful with 3.5) and more importantly, in terms of following your instructions in user/system prompts, so it's easier to get higher quality code from it now, as long as you can put into words what "higher quality code" means for you.

ocharles · 2d ago

I write Haskell with Claude Code and it's got remarkably good recently. We have some code at work that uses STM to have what is essentially a mutable state machine. I needed to split a state transition apart, and it did an admirable job. I had to intervene once or twice when it was going down a valid, but undesirable approach. This almost one shot performance was already a productivity boost, but didn't quite build. What I find most impressive now is the "fix" here is to literally have Claude run the build and see the errors. While GHC errors are verbose and not always the best it got everything building in a few more iterations. When it later got a test failure, I suggested we add a bit more logging - so it logged all state transitions, and spotted the unexpected transition and got the test passing. We really are a LONG way away from 3.5 performance.

r_lee · 2d ago

I'm not sure I'd say "100% this" if I was talking about GPT 3.5...

verelo · 2d ago

Yeah, 3.5 was good when it came out but frankly anyone reviewing AI for coding not using sonnet 4.1, GPT-5 or equivalent is really not aware of what they've missed out on.

Insanity · 2d ago

Yah, that’s a fair point. I had assumed it’d remain relatively similar given that the training data would be smaller for languages like Haskell versus languages like Python & JavaScript.

danielbln · 2d ago

Post-training in all frontier models has improved significantly wrt to programming language support. Take Elexir, which LLMs could barely handle a test ago, but now support has gotten really good

computerex · 2d ago

3.5 was a joke in coding compared to sonnet 4.

Insanity · 2d ago

Yup fair point, it’s been some time. Although vibe coding is more “miss” than “hit” for me.

pizza · 1d ago

It's so thrilling that this is actually true in just a year

johnisgood · 2d ago

I wrote some Haskell using Claude. It was great.

SatvikBeri · 2d ago

I've had a lot of good luck with Julia, on high performance data pipelines.

bugglebeetle · 1d ago

Write a blog post about this! Would love to read it.

jszymborski · 2d ago

ChatGPT is pretty useless at Prolog IME

sema4hacker · 1d ago

If I was told I'd be working with a fellow programmer who would make all the mistakes listed in Section 5 of the article, I'd have to say "no thanks". Yet the author ends with "I don’t think I will ever code again without the assistance of an AI model". He's a lot more thick-skinned than I.

AugSun · 1d ago

What would you expect from "AI guy vibing AI code for AI application"? Marco warned you about the "AI echo chamber" from the outset - and he kept his promise :-)

razzmatazmania · 1d ago

Some people also value programs for their productive ends rather than value them for the process of writing them in a pleasing way. Personally, I've been getting more done than ever with Claude Code. That I am able to work just a few minutes at a time then let the machine go is really nice as a parent. For those of us who don't program for a day job, but need programs for our day job, Claude and friends have completely changed what's possible.

afro88 · 2d ago

Great article, though I'm still reading it as it's a mammoth read!

A side note: as it's been painfully pointed out to me, "vibe coding" means not reading the code (ever!). We need a term for coding with LLMs exclusively, but also reviewing the code they output at each step.

layer8 · 2d ago

We could revive the old CASE acronym (https://en.wikipedia.org/wiki/Computer-aided_software_engine...). ;)

Timwi · 1d ago

SWITCH: SoftWare Implementation Through Computer & Human

BASE: Brain And Silicon Engineering

CLASS: Computer/Llm-Assisted Software Specification

STRUCT: Scripting Through Recurrent User/Computer Teamup

ELSE: Electronically Leveraged Software Engineering

VOID: Very Obvious Intelligence Deficit

Okay maybe not that last one

jagged-chisel · 2d ago

Actually a good suggestion.

Prediction: arguments over the definition will ensue

Disposal8433 · 2d ago

It's called "reviewing code." I'm not taking any kind of responsibility for code that I haven't written myself.

afro88 · 2d ago

You're not just hitting go and reviewing code though. If someone asked how I built a side project and I said "reviewing code" it would make no sense.

archargelod · 1d ago

> If someone asked how I built a side project Then you might have to say the truth: that you didn't build it, but Claude/OpenAI/Gemini built it under your supervision.

afro88 · 1d ago

Now we're getting somewhere. I think there's more than just supervision involved though. The ideas, direction and design are also provided by the person driving and reviewing output from the agents.

mellosouls · 2d ago

I use "Pro-coding" as it implies professionalism or process, or at least some sort of formality.

It doesn't imply AI, but I don't distinguish between AI-assisted and pre-AI coding, just vibe-coding as I think thats the important demarcation now.

tln · 2d ago

Prompt coding or just prompting

"Lets prompt up a new microservice for this"

"What have you been prompting lately?"

"Looking at commits, prompt coding is now 50% of your output. Have a raise"

ofjcihen · 2d ago

What is the term for getting the ick from reading?

mcrk · 2d ago

Just use "coding", then let's reserve the word "programming" for Linus.

BinaryIgor · 2d ago

What stands out for me, is that it was all possible thanks to the fact that the AI operator/conversationalist had enough knowledge to, more or less write, it all by hand, if he chose to.

Probably it was said many times already, but it will rather be the competition between programmers with AI and programmers without one, rather than no programmers with AI.

In particular, I love this part:

"I had serious doubts about the feasibility and efficiency of using inherently ambiguous natural languages as (indirect) programming tools, with a machine in between doing all the interpretation and translation toward artificial languages endowed with strict formal semantics. No more doubts: LLM-based AI coding assistants are extremely useful, incredibly powerful, and genuinely energising.

But they are fully useful and safe only if you know what you are doing and are able to check and (re)direct what they might be doing — or have been doing unbeknownst to you. You can trust them if you can trust yourself."

topranks · 1d ago

Exactly this.

Which isn’t really “vibe coding” as it’s been promoted, i.e. a way for non-programmers to just copy and paste their way to fully working software systems.

It’s a very powerful tool but needs to be used by someone with the expertise to find the flaws.

faangguyindia · 2d ago

Basically, at place we've a coding agent in a while loop.

What it does is pretty simple. You give it a problem, setup enviornment with libraries and all.

It continuously makes changes to the program, then checks it output.

And iteratively improves it.

For example, we used it to build a new method to apply diffs generated by LLMs to files.

As different models are good at different things, we managed to run it against models to figure out which method performs best.

Can a human do it? I doubt.

mierz00 · 1d ago

What kind of problems have you been throwing at it?

loudmax · 1d ago

From the post:

> Also, these assistants (for now) appear to exhibit no common sense about what is “much”, “little”, “exceptional”, “average”, etc. For example, after measuring a consumption of 3.5GB of memory (!!) for solving a 3-disk problem (due to a bug), the assistant declared all was well...

That describes a good portion of my coworkers.

kelnos · 22h ago

To be clear, this was not a vibe coding exercise, despite the title. The author supervised and reviewed the code changes at every step, caught mistakes and sub-optimal solutions, and worked with the LLM to fix and improve those problems.

This is not someone who just said "build me X", left it to run for a while, and then accepted whatever it wrote without reading it.

(I'm not criticizing the article's author here. It was an excellent, thoughtful read, and I think an article that was actually about something vibe-coded would be boring and not really teach me anything useful.)

lukaslalinsky · 1d ago

As a long-time programmer, I have super positive experience with Claude Code. I can write all the code it can, I'm certain I can do it better, and I can probably do it faster as well. However, what I don't have is time and energy. I can spend the little time I have on the requirements and review, and let CC deal with the stuff in between, while I focus on personal life. It's a huge value to me. It literally got my back into the programming game.

coolgoose · 1d ago

It's an awesome article but one thing makes me twitch

`wrote a non-optimal algorithm and claimed it is optimal (in terms of guaranteed shortest solution) until (sometimes later) I noticed the bug;`

That's my general concern, that the Ai generation would make mistakes that I would otherwise catch, but getting into the vibe, I might start to trust the AI a bit too much, and all those lovely subtle bugs might pop up.

nurettin · 2d ago

My experience exactly. (including nearly 40 years of code exposure) I just wish there was an alternative to Claude sonnet 4. I see gemini pro 2.5 as a side girlfriend, but only Claude truly vibes with me.

discordance · 1d ago

If Claude vibes with you, why do you need an alternative?

nurettin · 1d ago

If you only have one choice, that is exactly why you need more choices. After doubling their prices and adding quotas with 5 hour pauses, I don't trust anthropic to not pull the rug again.

nomilk · 2d ago

> English as code

First time encountering the phrase.

Evolution went from Machine Code to Assembly to Low-level programming languages to High-level programming languages (with frameworks), to... plain English.

mjhagen · 2d ago

that phrase was used all the time to describe HyperTalk and AppleTalk

sema4hacker · 1d ago

In the early 70's Microdata trademarked "English" as the name of their SQL-like database retrieval language.

numpad0 · 2d ago

I don't feel good doing it, but is anyone else feeling not capitalizing text, maintaining a slightly abrasive attitude, and consciously stealing credits, yield better results from coding agents? e.g. "i want xxx implemented, can you do", "ok you do" than "I'm wondering if..." etc.

SV_BubbleTime · 2d ago

There is so much subjective placebo with “prompt engineering” that anyone pushing any one thing like this just shows me they haven’t used it enough yet. No offense, just seeing it everywhere.

Better results if you… tip the AI, offer it physical touch, you need to say the words “go slow and take a deep breath first”…

It’s a subjective system without control testing. Humans are definitely going to apply religion, dogma, and ritual to it.

cdrini · 2d ago

The best research I've seen on this is:

- Threatening or tipping a model generally has no significant effect on benchmark performance.

- Prompt variations can significantly affect performance on a per-question level. However, it is hard to know in advance whether a particular prompting approach will help or harm the LLM's ability to answer any particular question.

https://arxiv.org/abs/2508.00614

SV_BubbleTime · 2d ago

That 100% tracks expectation if your technical knowledge exceeds past “believer”.

Now… for fun. Look up “best prompting” or “the perfect prompt” on YouTube. Thousands of videos “tips” and “expect recommendations” that are bordering the arcane.

theshrike79 · 1d ago

The worst people are just writing D&D Character backstories as agent prompts:

"You are a world-class developer in <platform>..." type of crap.

SV_BubbleTime · 23h ago

Haha, at least one this… I could make an excuse, that if I’m juggling prompts around, one that starts “You are a copy writer…” vs “You are an editor that…” lets me separate them with natural language vs some historically dubious file system disorganization.

diggan · 2d ago

> Better results if you… tip the AI, offer it physical touch, you need to say the words “go slow and take a deep breath first”…

I'm not saying I've proven it or anything, but it doesn't sound far-fetched that a thing that generates new text based on previous text, would be affected by the previous text, even minor details like using ALL CAPS or just lowercase, since those are different tokens for the LLM.

I've noticed the same thing with what exact words you use. State a problem as a lay/random person, using none of the domain words for things, and you get a worse response compared to if you used industry jargon. It kind of makes sense to me considering how they work internally, but happy to be proven otherwise if you're sitting on evidence either way :)

SV_BubbleTime · 2d ago

We all agree that prompts are affected by tokens.

The issue is that you can’t know if you are positively or negatively effecting because there is no real control.

And the effect could switch between prompts.

kachapopopow · 2d ago

I tell my agent to off it self every couple of hours, it's definitely placebo as you're just introducing noise which might or might not be good. Adding hmm, <prompt> has been my goto for a bit if I want it to force to give me different results cause it appears to trigger some latent regions of the llms.

SV_BubbleTime · 2d ago

This seems to be exactly what I’m talking about though. We made a completely subjective system and now everyone has completely subjective advice about what works.

I’m not saying introducing noise isn’t a valid option, just doing it in ‘X’ or ‘y’ method as dogma is straight bullshit.

throwawa14223 · 2d ago

This is one of many reasons that I believe the value of current AI tech is zero if not negative.

jmull · 2d ago

Why not just:

"Implement xxx"

I don't think we can offend these things (yet).

numpad0 · 2d ago

idk, my thinking is that sweatshop-slack-like inputs might correspond to more professional outcomes than exam-like questions as exams would be more likely to be solved by beginners. I also fear "Implement xxx" might be just too short, I feel they might like to have some bytes to map to outputs. Could very well be placebo as pointed out.

alluro2 · 1d ago

I finally got to do something more extensive and serious with Claude Code / Gemini. It's basically a more complex CRUD app for multiple data entities, with some additional functionality.

I'm hoping that sharing my experience, amongst all others, can: A) help someone understand more / set their expectations B) get someone to point out how to do it better

On one hand, I managed, in 10 days, to get the amount of functionality would take ~2 months of coding "by hand". If I started the same project now - after learning, realising what works and not, and adapting - it would probably be possible in 5. The amount done was incredible - and it's working.

On the other hand:

- you need to be already very experienced in knowing how things should be built well, how they need to work together, and what is a good way to organize the user interface for the functionality

- you then need to have some practical experience with LLMs to know the limitations, and guide it through the above gradually, with proper level of detail provided and iteration. Which takes attention and process and time - it won't be a couple of sentences and hitting enter a couple of times, no matter how smart your prompts are

- otherwise, if you didn't think it through and planned it first, and did it with consideration of LLM itself, and you just give it high level requirements for an app with multiple functionalities - you'll just get a mess. You can try and improve your prompts over and over, and you'll get a different kind of mess every time, but mess nevertheless

- even with doing all of the above, you'll get a very very mediocre result in terms of "feeling of quality" - thoughtfulness of design, how information is laid out and things are organised - UX and polish. It might be more than fine for a lot of use-cases, but if you're building something that people need to use productively every day, it's not passable...

- the problem is that, at least in my experience, you can't get it to high level with LLM in an automated way - you still need to craft it meticulously. And doing that will require manually tearing down a lot of what LLM generated. And that way you'll still end up with something at least a bit compromised, and messy when it comes to code

In summary, it's amazing how far it's come and how much you can do quickly - but if you need quality, there's no going around it, you still need most of the effort and time do invest in it. Considering both together, I think it's still a great position to be in currently for people who can provide that needed level of quality - sometimes you can do things very easily and quickly and sometimes you do your proud work with a bit of assistance along the way.

I'm not sure until when that will work, or what happens later, or how does current state already bodes for less experienced people...

ChrisMarshallNY · 2d ago

This was a great write-up!

It looks like the methodology this chap used could become a boilerplate.

EagnaIonat · 1d ago

Article: All in all, my impression is that you have to read carefully whatever the AI assistant writes if you want to be sure you “own your code”

If you are 100% vibe coding then you do not own the code at all. You might have some limited protections in the UK, but EU/US any AI generated code can't be copyrighted.

So someone can steal your Vibe coded app and resell it without fear.

.. The other major issue that I have seen using LLMs is that it is useless if you "don't know what you don't know". Sample code often given is incorrect, or not the best approach.

A few times when I discuss my issues with the code generated, it has offered better code to do the same thing.

the_af · 2d ago

This was interesting.

I still wonder, if (as the author mentions and I've seen in my experience) companies are pivoting to hiring more senior devs and fewer or no junior devs...

... where will the new generations of senior devs come from? If, as the author argues, the role of the knowledgeable senior is still needed to guide the AI and review the occasional subtle errors it produces, where will new generations of seniors be trained? Surely one cannot go from junior-to-senior (in the sense described in TFA) just by talking to the AI? Where will the intuition that something is off come from?

Another thing that worries me, but I'm willing to believe it'll get better: the reckless abandon with which AI solutions consume resources and are completely obvious to it, like TFA describes (3.5 GB of RAM for the easiest, 3 pillar Hanoi configuration). Every veteran computer user (not just programmers but also gamers) has been decrying for ages how software becomes more and more bloated, how hardware doesn't scale with the (mis)use of resources, etc. And I worry this kind of vibe coding will only make it horribly worse. I'm hoping some sense of resource consciousness can be included in new training datasets...

furyofantares · 2d ago

People keep saying this, but the young folks who start out with stuff are gonna surpass us old folks at some point. Us old folks just get a big head start.

Right now we're comparing seniors who learned the old way to juniors who learned the old way. Soon we'll start having juniors who started out with this stuff.

It also takes time to learn how to teach people to use tools. We're all still figuring out how to use these, and I think again, more experience is a big help here. But at some point we'll start having people who not only start out with this stuff, but they get to learn from people who've figured out how to use it already.

the_af · 2d ago

> Soon we'll start having juniors who started out with this stuff.

But who will hire them? Businesses are ramping down from hiring juniors, since apparently a few good seniors with AI can replace them (in the minds of the people doing the hiring).

Or is it that when all of the previous batch of seniors have retired or died of old age, businesses will have no option but to hire juniors trained "the new way", without a solid background to help them understand when AI solutions are flawed or misguided, and pray it all works out?

furyofantares · 2d ago

> But who will hire them?

Anyone who wants a competitive advantage?

My claim is that the gap between junior and senior has temporarily widened, which is why someone who previously would want to hire juniors might not right now. But I expect it will narrow as a generation that learned on this stuff comes into the fold, probably to a smaller gap than existed pre-LLM.

I think it will also narrow if the tools continue to get better.

the_af · 1d ago

> Anyone who wants a competitive advantage?

Do you mean long-term vision? Short-term the advantage is in hiring only seniors, but do you mean companies will foresee trouble looming ahead and "waste" money on juniors just to avert this disaster?

My own feeling is that this could become like a sort of... well, I recently heard of the term "population time bomb", and it was eye-opening for me. How once it starts rolling, it's incredibly hard/impossible to revert, etc.

So what if we have some sort of "experience time bomb" here? Businesses stop hiring juniors. Seniors are needed to make AI work, but their experience isn't passed on because... who to pass it to? And then juniors won't have this wealth of "on the job experience" to be able to smell AI disaster and course-correct. The kind of experience you learn from actual work, not books.

farceSpherule · 1d ago

I have a colleague at Amazon who tells me he has several people on his team who vibe code and who blindly check in thousands of lines of LLM code when it "works."

It takes 10 to 20 times as long to debug because the code is impossible to change or understand how it works.

the_real_cher · 1d ago

> But they are fully useful and safe only if you know what you are doing and are able to check and (re)direct what they might be doing — or have been doing unbeknownst to you. You can trust them if you can trust yourself.

This is the crux of the A.I. issue.

What a spectacular article.

bgwalter · 2d ago

Super long article, empty GitHub apart from the vibe stuff. I can't find any biography or affiliation.

simonw · 2d ago

This looks like him: https://www.bankit.art/people/marco-benedetti

ChrisMarshallNY · 2d ago

I enjoyed it, but then, I trend prolix, myself.

WaxProlix · 2d ago

Same

The US is transforming into a 1930s-style autocracy, says billionaire Ray Dalio (cnn.com)

Trump to Move Space Command Headquarters to Alabama from Colorado (reuters.com)

Some more thoughts on the ARKK-Bullish trade (ft.com)

Google can't have exclusive search deals (businessinsider.com)

Google gets to keep Chrome but is barred from exclusive search deals (cnbc.com)

All of our lives overlap in the Network Of Time (networkoftime.com)

US sliding towards 1930s-style autocracy, warns Ray Dalio (ft.com)

The 2025 Guide to AI Agents (ibm.com)

Can Creatine Keep Your Brain Sharp? (time.com)

Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS (huggingface.co)

Amazon cracks down on Prime free shipping sharing (cnbc.com)

"&udm=14" AI-overview-free Google search (udm14.org)

Coding the Matrix: Linear Algebra Through Computer Science Applications (codingthematrix.com)

Empowering Innovation: The Python Paved Road (americanexpress.io)

U.S. Emissions Rise 4.2%, China's Fall 2.7% (theenergymix.com)

The First Thirty Years of Green Stormwater Infrastructure in Portland, Oregon (mdpi.com)

Why Self-Host 9front (orib.dev)

OpenAI acquires product testing startup Statsig (techcrunch.com)

Chinese hackers breach 700 companies through single Salesforce integration (nearlyright.com)

Trump fortune balloon by billions after family firms crypto token starts trading (theguardian.com)

Making XML human-readable without XSLT (jakearchibald.com)

New York is turning 400 and no one cares (economist.com)

Elm should have had Algebraic Effects (interjectedfuture.com)

Study: New Seasonal Asynchrony Emerges (nature.com)

C Tooling (tomscheers.github.io)

Why VPN Audits Don't Protect Your Privacy (vp.net)

Why North Korea's Kim Jong Un Took a Train to China's Military Parade (wsj.com)

Matrix.org service offline: corrupted database (status.matrix.org)

OpenAI to acquire product testing startup Statsig, appoints CTO of applications (reuters.com)

I built a tool that converts Git commits into a full Changelog (shiplog.sh)

Musk Has a New Master Plan for Tesla. Uh Oh. (bloomberg.com)

How a Muppets fan website thwarted hacking attempts (itbrew.com)

Linux home server sleep on idle and wake on demand – the simple way (dgross.ca)

The Things I Find Myself Repeating About Go – Dave Cheney [video] (youtube.com)

How to Spot (and Fix) 5 Common Performance Bottlenecks in Pandas Workflows (developer.nvidia.com)

How to Write Docstrings in Python – Real Python (realpython.com)

AI Key by Dafdef – let an AI control your phone (dafdef.com)

Tracking malicious code execution in Python – Artem Golubin (rushter.com)

First attempt will be 95% garbage: 6 weeks with Claude Code (sanity.io)

Show HN: Extending Windows Batch Script with audio, mouse and keyboard input (github.com)

iNaturalist keeps full species classification models private (github.com)

Show HN: Lightweight server-driven template language for JavaScript (github.com)

Amazon must face US nationwide class action over third-party sales (reuters.com)

Frostbyte10 bugs put refrigerators at major grocery chains at risk (theregister.com)

BASF Delivers First Cathode Active Materials for Semi-Solid-State Batteries (basf.com)

Google AI Studio on X: "Six text-to-image prompting tips for Nano Banana" / X (twitter.com)

A model of Boy's surface in constructive solid geometry (math.univ-toulouse.fr)

Lighter, Cheaper Vision Air Coming in 2027 (macrumors.com)

Long-lasting antimicrobial effects of nitric oxide-releasing hand sanitizer gel (pubs.rsc.org)

Fillamte: Context-Aware, One-Click Autofill for Complex Forms (fillmate.info)

Vibe coding as a coding veteran: from 8-bit assembly to English-as-code

Comments (117)