Ask HN: What in your opinion is the best model for vibecoding? My thoughts below

1 adinhitlore 1 9/16/2025, 3:37:34 PM

So I've been vibecoding for like years but over the past 1-2 weeks it became an obsession to a point my eyes are literally red and inflamed right now since I can't stop it (slightly humours...i was feeling worse off yesterday, the redness is now gone).

anyways my takes:

1. The #1 place is VERY debatable for me, it's a toss between gpt 5 high, "claude thinking" both sonnet 4 and 4.1 opus and surprise,surprise: qwen 235b 'thinking' (the "hidden gem").

Their pros and cons:

gpt 5 high: Usually gives VERY long code so it'generous, no compute is saved, it's a bona fide model but it seems sometimes too aligned for my taste. For example: whenever i force it to design a novel text generation model, unless i am very speficic in my requirements it tries to dumb it down by making a pure n-gram model, which almost feels like an insult, basically saying "look we at openai are the best, here's a stupid markov chain for you to play with, but leave the big game to us". If however you phrase it more in detail and even if you show some pessimism it will not "echo back" the pessimism but rather try to convince you it can be done with some tweaks. The con: Usually it's just...not smart, this is easily seen when you go through the code and you see it had written code very specific to the example you gave, which is the number one symptom of bad programming, a variable/method should be as universal as possible, you don't need a template which only uploads ftp when you plan an upload via http and ftp, as a one example.

2. Claude: Initially i thought it's the best one and for pure coding it's "getting there" but for designing algorithms, gtpt 5 high and qwen 'thinking' outperform it with ideas. I'd say sonnet 4 32k is better for designing and opus for the actual coding, maybe depending on the task and programming language used it will perform differently. The good news is the actual code usually compiles with very few warnings and almost never errors, so it knows what its doing. Even gpt 5 high is worse and qwen will sometimes though rarely give you bad code that will produce an error be it in Python 3 or C/gcc.

Since i covered the 'good' here are the bad and the ugly:

Gemini, grok, amazon nova, whatever microsoft has: don't, just don't. Their shortcomings are so obvious to a point I'm convinced all the people who hype them online are either elon musk (for grok), bill gates (phi4 etc) or zuckerberg (llama). Their codes are very short so it's obvious they will not cover the features requested, compilation feels like 'quantum mechanics' 50/50 chance, the code is written in the worst way possible, and sometimes they even misinterpret entirely what your goal is. You may have some luck debugging with gemini 2.5 pro if you're patient, frankly even the gpt 4 on chatgpt.com version (not the "arena!") is bad for fixing errors but ok with the basic ones.

Another hidden gem: https://console.upstage.ai/playground/chat I'm not "shilling" for it, hard to believe i know, but i don't ignore it entirely because as an indie model i hope it's not too aligned so it may actually give you code that Yudkovsky and Yampolski consider "immediate risk to humanity, civilization and the galaxy".

My experience are 90% with C mostly a lot of Python too, little-to-no C# though back in the days vibecoding c# on gpt 4 sucked a lot.

My ultimate issue as of now is that while LLMs/transformers are great they still lack the innovation, human thought power to come up with original ideas, however they code way faster than human obviously and the code usually works with few warnings or errors - i think the focus towards 2030 should be the innovation power and complex designs of algorithms. Altman dreaming about "discovering new physics" seems a little bit ambitious given the current status quo. Again they're great and they help me a lot, looking forward to see their impact on larger scale on society!

Comments (1)

reify · 2m ago

The 1925 Ford Model T Touring Car is the best bet.

It has amazing brakes for a 1920's car.

The best thing in my experience is, it does not rely on fantasy ai to drive it. you can just turn the key and Vrooom, away you go.

My local mechanic is be particularly pleased with my purchase and recommendation.

He says, he can repair my car without resorting to repairing the damage the ai mechanic did a few days earlier. which, in the long run saves me an awful lot of money on car maintenance.

I dont have to pay two people to fix one job.

isnt it amazing what humans can do.

Things you can do with a Software Defined Radio (2024) (blinry.org)

CIA Freedom of Information Act Electronic Reading Room (cia.gov)

Self Propagating NPM Malware Compromises over 40 Packages (stepsecurity.io)

We're launching a new Google app for Windows experiment in Labs (blog.google)

Mother of All Demos (1968) (wordspike.com)

FBI couldn't get my husband to decrypt his Tor node so he was jailed for 3 years (old.reddit.com)

Implicit Ode Solvers Are Not Universally More Robust Than Explicit Ode Solvers (stochasticlifestyle.com)

Generative AI is hollowing out entry-level jobs, study finds (papers.ssrn.com)

Tesla Faces US Auto Safety Investigation over Door Handles (bloomberg.com)

Hosting a website on a disposable vape (bogdanthegeek.github.io)

Robert Redford has died (nytimes.com)

60 years after Gemini, newly processed images reveal details (arstechnica.com)

Teen Safety, Freedom, and Privacy (openai.com)

Microsoft Favors Anthropic over OpenAI for Visual Studio Code (theverge.com)

Scientists uncover extreme life inside the Arctic ice (news.stanford.edu)

Java 25 Officially Released (mail.openjdk.org)

The old SF tech scene is dead. What it's morphing into is more sinister (sfgate.com)

Learn x86-64 assembly by writing a GUI from scratch (2023) (gaultier.github.io)

React is winning by default and slowing innovation (lorenstew.art)

"Your" vs. "My" in user interfaces (adamsilver.io)

Migrating to React Native's New Architecture (shopify.engineering)

Adding FRM parser utility to MariaDB (hp77-creator.github.io)

macOS Tahoe (apple.com)

William Gibson Reads Neuromancer (2004) (bearcave.com)

Trucker built a scale model of NYC over 21 years (gothamist.com)

WordNumbers: Counting letters of number names, alphabetized and concatenated (conway.rutgers.edu)

Wanted to spy on my dog, ended up spying on TP-Link (kennedn.com)

Klotski (2swap.github.io)

I feel Apple has lost its alignment with me and other long-time customers (morrick.me)

PayPal to support Ethereum and Bitcoin (newsroom.paypal-corp.com)

The Mythical Creatures of London (londonist.com)

Teens turned their rooms into tech-free zones. This was the result (bbc.co.uk)

GPT-5-Codex (openai.com)

Addendum to GPT-5 system card: GPT-5-Codex (openai.com)

Why do we keep gravitating toward complexity? (kyrylo.org)

Launch HN: Trigger.dev (YC W23) – Open-source platform to build reliable AI apps

Office 2016 and 2019 face October 14 execution date (theregister.com)

DuckDB 1.4.0 LTS (duckdb.org)

Linux phones are more important now than ever (feddit.org)

How People Use ChatGPT [pdf] (cdn.openai.com)

CubeSats are fascinating learning tools for space (jeffgeerling.com)

I wish my web server were in the corner of my room (2022) (interconnected.org)

Show HN: Pyproc – Call Python from Go Without CGO or Microservices (github.com)

How to self-host a web font from Google Fonts (blog.velocifyer.com)

Massive Attack turns concert into facial recognition surveillance experiment (gadgetreview.com)

People Who Hunt Down Old TVs (bbc.com)

Why "AI consciousness" isn't coming anytime soon. (Anil Seth) (freethink.com)

The productivity paradox of AI coding assistants (cerbos.dev)

Death to type classes (jappie.me)

A qualitative analysis of pig-butchering scams (arxiv.org)

Ask HN: What in your opinion is the best model for vibecoding? My thoughts below

Comments (1)