Ask HN: What in your opinion is the best model for vibecoding? My thoughts below
anyways my takes:
1. The #1 place is VERY debatable for me, it's a toss between gpt 5 high, "claude thinking" both sonnet 4 and 4.1 opus and surprise,surprise: qwen 235b 'thinking' (the "hidden gem").
Their pros and cons:
gpt 5 high: Usually gives VERY long code so it'generous, no compute is saved, it's a bona fide model but it seems sometimes too aligned for my taste. For example: whenever i force it to design a novel text generation model, unless i am very speficic in my requirements it tries to dumb it down by making a pure n-gram model, which almost feels like an insult, basically saying "look we at openai are the best, here's a stupid markov chain for you to play with, but leave the big game to us". If however you phrase it more in detail and even if you show some pessimism it will not "echo back" the pessimism but rather try to convince you it can be done with some tweaks. The con: Usually it's just...not smart, this is easily seen when you go through the code and you see it had written code very specific to the example you gave, which is the number one symptom of bad programming, a variable/method should be as universal as possible, you don't need a template which only uploads ftp when you plan an upload via http and ftp, as a one example.
2. Claude: Initially i thought it's the best one and for pure coding it's "getting there" but for designing algorithms, gtpt 5 high and qwen 'thinking' outperform it with ideas. I'd say sonnet 4 32k is better for designing and opus for the actual coding, maybe depending on the task and programming language used it will perform differently. The good news is the actual code usually compiles with very few warnings and almost never errors, so it knows what its doing. Even gpt 5 high is worse and qwen will sometimes though rarely give you bad code that will produce an error be it in Python 3 or C/gcc.
Since i covered the 'good' here are the bad and the ugly:
Gemini, grok, amazon nova, whatever microsoft has: don't, just don't. Their shortcomings are so obvious to a point I'm convinced all the people who hype them online are either elon musk (for grok), bill gates (phi4 etc) or zuckerberg (llama). Their codes are very short so it's obvious they will not cover the features requested, compilation feels like 'quantum mechanics' 50/50 chance, the code is written in the worst way possible, and sometimes they even misinterpret entirely what your goal is. You may have some luck debugging with gemini 2.5 pro if you're patient, frankly even the gpt 4 on chatgpt.com version (not the "arena!") is bad for fixing errors but ok with the basic ones.
Another hidden gem: https://console.upstage.ai/playground/chat I'm not "shilling" for it, hard to believe i know, but i don't ignore it entirely because as an indie model i hope it's not too aligned so it may actually give you code that Yudkovsky and Yampolski consider "immediate risk to humanity, civilization and the galaxy".
My experience are 90% with C mostly a lot of Python too, little-to-no C# though back in the days vibecoding c# on gpt 4 sucked a lot.
My ultimate issue as of now is that while LLMs/transformers are great they still lack the innovation, human thought power to come up with original ideas, however they code way faster than human obviously and the code usually works with few warnings or errors - i think the focus towards 2030 should be the innovation power and complex designs of algorithms. Altman dreaming about "discovering new physics" seems a little bit ambitious given the current status quo. Again they're great and they help me a lot, looking forward to see their impact on larger scale on society!
It has amazing brakes for a 1920's car.
The best thing in my experience is, it does not rely on fantasy ai to drive it. you can just turn the key and Vrooom, away you go.
My local mechanic is be particularly pleased with my purchase and recommendation.
He says, he can repair my car without resorting to repairing the damage the ai mechanic did a few days earlier. which, in the long run saves me an awful lot of money on car maintenance.
I dont have to pay two people to fix one job.
isnt it amazing what humans can do.