Jujutsu for Everyone (jj-for-everyone.github.io)

I believe choosing a well known problem space in a well known language certainly influenced a lot of the behavior. AIs usefulness is correlated strongly with its training data and there’s no doubt been a significant amount of data about both the problem space and Python.

I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.

It was a great read regardless!

Insanity · 3h ago

100% this. I tried haskelling with LLMs and it’s performance is worse compared to Go.

Although in fairness this was a year ago on GPT 3.5 IIRC

diggan · 2h ago

> Although in fairness this was a year ago on GPT 3.5 IIRC

GPT3.5 was impressive at the time, but today's SOTA (like GPT 5 Pro) are almost night-and-difference both in terms of just producing better code for wider range of languages (I mostly do Rust and Clojure, handles those fine now, was awful with 3.5) and more importantly, in terms of following your instructions in user/system prompts, so it's easier to get higher quality code from it now, as long as you can put into words what "higher quality code" means for you.

r_lee · 2h ago

I'm not sure I'd say "100% this" if I was talking about GPT 3.5...

verelo · 1h ago

Yeah, 3.5 was good when it came out but frankly anyone reviewing AI for coding not using sonnet 4.1, GPT-5 or equivalent is really not aware of what they've missed out on.

johnisgood · 16m ago

I wrote some Haskell using Claude. It was great.

danielbln · 3h ago

Post-training in all frontier models has improved significantly wrt to programming language support. Take Elexir, which LLMs could barely handle a test ago, but now support has gotten really good

afro88 · 2h ago

Great article, though I'm still reading it as it's a mammoth read!

A side note: as it's been painfully pointed out to me, "vibe coding" means not reading the code (ever!). We need a term for coding with LLMs exclusively, but also reviewing the code they output at each step.

layer8 · 1h ago

We could revive the old CASE acronym (https://en.wikipedia.org/wiki/Computer-aided_software_engine...). ;)

Disposal8433 · 2h ago

It's called "reviewing code." I'm not taking any kind of responsibility for code that I haven't written myself.

mellosouls · 2h ago

I use "Pro-coding" as it implies professionalism or process, or at least some sort of formality.

It doesn't imply AI, but I don't distinguish between AI-assisted and pre-AI coding, just vibe-coding as I think thats the important demarcation now.

tln · 2h ago

Prompt coding or just prompting

"Lets prompt up a new microservice for this"

"What have you been prompting lately?"

"Looking at commits, prompt coding is now 50% of your output. Have a raise"

ofjcihen · 1h ago

What is the term for getting the ick from reading?

mcrk · 2h ago

Just use "coding", then let's reserve the word "programming" for Linus.

stavros · 3h ago

I've come to view LLMs as a consulting firm where, for each request, I have a 50% chance of getting either an expert or an intern writing my code, and there's no way to tell which.

Sometimes I accept this, and I vibe-code, when I don't care about the result. When I do care about the result, I have to read every line myself. Since reading code is harder than writing it, this takes longer, but LLMs have made me too lazy to write code now, so that's probably the only alternative that works.

I have to say, though, the best thing I've tried is Cursor's autocomplete, which writes 3-4 lines for you. That way, I can easily verify that the code does what I want, while still reaping the benefit of not having to look up all the APIs and function signatures.

kaptainscarlet · 3h ago

I've also had a similar experience. I have become too lazy since I started vibe-coding. My coding has transitioned from coder to code reviewer/fixer vey quickly. Overall I feel like it's a good thing because the last few years of my life has been a repetition of frontend components and api endpoints, which to me has become too monotonous so I am happy to have AI take over that grunt work while I supervise.

stavros · 3h ago

Yeah, exactly the same for me. It's tiring writing the same CRUD endpoints a thousand times, but that's how useful products are made.

MangoCoffee · 3h ago

>When I do care about the result, I have to read every line myself.

isn't that the same as delegated task to jr developer but you still have to check their work as sr?

stavros · 2h ago

It is, but not the same as if a senior developer were writing it. I would feel much less like I have to check it then.

faangguyindia · 1h ago

Basically, at place we've a coding agent in a while loop.

What it does is pretty simple. You give it a problem, setup enviornment with libraries and all.

It continuously makes changes to the program, then checks it output.

And iteratively improves it.

For example, we used it to build a new method to apply diffs generated by LLMs to files.

As different models are good at different things, we managed to run it against models to figure out which method performs best.

Can a human do it? I doubt.

numpad0 · 3h ago

I don't feel good doing it, but is anyone else feeling not capitalizing text, maintaining a slightly abrasive attitude, and consciously stealing credits, yield better results from coding agents? e.g. "i want xxx implemented, can you do", "ok you do" than "I'm wondering if..." etc.

jmull · 33m ago

Why not just:

"Implement xxx"

I don't think we can offend these things (yet).

SV_BubbleTime · 3h ago

There is so much subjective placebo with “prompt engineering” that anyone pushing any one thing like this just shows me they haven’t used it enough yet. No offense, just seeing it everywhere.

Better results if you… tip the AI, offer it physical touch, you need to say the words “go slow and take a deep breath first”…

It’s a subjective system without control testing. Humans are definitely going to apply religion, dogma, and ritual to it.

cdrini · 2h ago

The best research I've seen on this is:

- Threatening or tipping a model generally has no significant effect on benchmark performance.

- Prompt variations can significantly affect performance on a per-question level. However, it is hard to know in advance whether a particular prompting approach will help or harm the LLM's ability to answer any particular question.

https://arxiv.org/abs/2508.00614

SV_BubbleTime · 1h ago

That 100% tracks expectation if your technical knowledge exceeds past “believer”.

Now… for fun. Look up “best prompting” or “the perfect prompt” on YouTube. Thousands of videos “tips” and “expect recommendations” that are bordering the arcane.

diggan · 2h ago

> Better results if you… tip the AI, offer it physical touch, you need to say the words “go slow and take a deep breath first”…

I'm not saying I've proven it or anything, but it doesn't sound far-fetched that a thing that generates new text based on previous text, would be affected by the previous text, even minor details like using ALL CAPS or just lowercase, since those are different tokens for the LLM.

I've noticed the same thing with what exact words you use. State a problem as a lay/random person, using none of the domain words for things, and you get a worse response compared to if you used industry jargon. It kind of makes sense to me considering how they work internally, but happy to be proven otherwise if you're sitting on evidence either way :)

SV_BubbleTime · 1h ago

We all agree that prompts are affected by tokens.

The issue is that you can’t know if you are positively or negatively effecting because there is no real control.

And the effect could switch between prompts.

throwawa14223 · 2h ago

This is one of many reasons that I believe the value of current AI tech is zero if not negative.

kachapopopow · 3h ago

I tell my agent to off it self every couple of hours, it's definitely placebo as you're just introducing noise which might or might not be good. Adding hmm, <prompt> has been my goto for a bit if I want it to force to give me different results cause it appears to trigger some latent regions of the llms.

SV_BubbleTime · 2h ago

This seems to be exactly what I’m talking about though. We made a completely subjective system and now everyone has completely subjective advice about what works.

I’m not saying introducing noise isn’t a valid option, just doing it in ‘X’ or ‘y’ method as dogma is straight bullshit.

nurettin · 49m ago

My experience exactly. (including nearly 40 years of code exposure) I just wish there was an alternative to Claude sonnet 4. I see gemini pro 2.5 as a side girlfriend, but only Claude truly vibes with me.

ChrisMarshallNY · 3h ago

This was a great write-up!

It looks like the methodology this chap used could become a boilerplate.

the_af · 1h ago

This was interesting.

I still wonder, if (as the author mentions and I've seen in my experience) companies are pivoting to hiring more senior devs and fewer or no junior devs...

... where will the new generations of senior devs come from? If, as the author argues, the role of the knowledgeable senior is still needed to guide the AI and review the occasional subtle errors it produces, where will new generations of seniors be trained? Surely one cannot go from junior-to-senior (in the sense described in TFA) just by talking to the AI? Where will the intuition that something is off come from?

Another thing that worries me, but I'm willing to believe it'll get better: the reckless abandon with which AI solutions consume resources and are completely obvious to it, like TFA describes (3.5 GB of RAM for the easiest, 3 pillar Hanoi configuration). Every veteran computer user (not just programmers but also gamers) has been decrying for ages how software becomes more and more bloated, how hardware doesn't scale with the (mis)use of resources, etc. And I worry this kind of vibe coding will only make it horribly worse. I'm hoping some sense of resource consciousness can be included in new training datasets...

furyofantares · 52m ago

People keep saying this, but the young folks who start out with stuff are gonna surpass us old folks at some point. Us old folks just get a big head start.

Right now we're comparing seniors who learned the old way to juniors who learned the old way. Soon we'll start having juniors who started out with this stuff.

It also takes time to learn how to teach people to use tools. We're all still figuring out how to use these, and I think again, more experience is a big help here. But at some point we'll start having people who not only start out with this stuff, but they get to learn from people who've figured out how to use it already.

the_af · 32m ago

> Soon we'll start having juniors who started out with this stuff.

But who will hire them? Businesses are ramping down from hiring juniors, since apparently a few good seniors with AI can replace them (in the minds of the people doing the hiring).

Or is it that when all of the previous batch of seniors have retired or died of old age, businesses will have no option but to hire juniors trained "the new way", without a solid background to help them understand when AI solutions are flawed or misguided, and pray it all works out?

bgwalter · 3h ago

Super long article, empty GitHub apart from the vibe stuff. I can't find any biography or affiliation.

simonw · 3h ago

This looks like him: https://www.bankit.art/people/marco-benedetti

ChrisMarshallNY · 3h ago

I enjoyed it, but then, I trend prolix, myself.

WaxProlix · 2h ago

Same