Coding with LLMs in the summer of 2025 – an update

85 antirez 38 7/20/2025, 11:04:02 AM antirez.com ↗

Comments (38)

dakiol · 18m ago

> Gemini 2.5 PRO | Claude Opus 4

Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

muglug · 10m ago

> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.

Yet JetBrains has been a business longer than some of my colleagues have been alive, and Microsoft’s Visual Basic/C++/Studio made writing software for Windows much easier, and did not come cheap.

dakiol · 3m ago

I see a big difference: I do use Jetbrains IDEs (they are nice), but I can switch to vim (or vscode) any time if I need to (e.g., let's say Jetbrains increase their price to a point that doesn't make sense, or perhaps they introduce a pervasive feature that cannot be disabled). The problem with paid LLMs is that one cannot easily switch to open-source ones (because they are not as good as the paid ones). So, it's a dependency that cannot be avoided, and that's imho something that shouldn't be overlooked.

azan_ · 15m ago

Paid models are just much, much better.

dakiol · 7m ago

Of course they are. I wouldn't expect otherwise :)

But the price we're paying (and I don't mean money) is very high, imho. We all talk about how good engineers write code that depends on high-level abstractions instead of low-level details, allowing us to replace third party dependencies easily and test our apps more effectively, keeping the core of our domain "pure". Well, isn't it time we started doing the same with LLMs? I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.

To put an example: what would you think if you need to pay for every single Linux process in your machine? Or for every Git commit you make? Or for every debugging session you perform?

airstrike · 1m ago

[delayed]

azan_ · 3m ago

> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.

There are open source tools that do exactly that already.

belter · 12m ago

The issue is somebody will have to debug and fix what those LLM Leeches made up. I guess then companies will have to hire some 10x Prompters?

quantumHazer · 59m ago

I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).

This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.

The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

antirez · 56m ago

Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.

spyckie2 · 23m ago

Hey antirez,

What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.

Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.

bgwalter · 41m ago

Translation: His company will launch "AI" products in order to get funding or better compete with Valkey.

I find it very sad that people who have been really productive without "AI" now go out of their way to find small anecdotal evidence for "AI".

brokencode · 23m ago

I find it even more sad when people come out of the woodwork on every LLM post to tell us that our positive experiences using LLMs are imagined and we just have realized how bad they are yet.

halfmatthalfcat · 11m ago

Could it not be that those positive experiences are just shining a light that the practices before using an LLM were inefficient? It’s more a reflection on the pontificator than anything.

on_the_train · 16m ago

If LLMs were actually useful, there would be no need to scream it everywhere. On the contrary: it would be a guarded secret.

neuronexmachina · 9m ago

In my experience, devs generally aren't secretive about tools they find useful.

hobs · 4m ago

I think many devs are guarding their secrets, but the last few decades have shown us that an open foundation can net huge benefits for everyone (and then you can put your secret sauce in the last mile.)

antirez · 9m ago

Did you read my post? I hope you didn’t because if you read it and reached these conclusions your judgement is deeply altered.

This post has nothing to do with Redis and is even a follow up to a post I wrote before rejoining the company.

nlh · 10m ago

Can anyone recommend a workflow / tools that accomplishes a slightly more augmented version of antirez’ workflow & suggestions minus the copy-pasting?

I am on board to agree that pure LLM + pure original full code as context is the best path at the moment, but I’d love to be able to use some shortcuts like quickly applying changes, checkpoints, etc.

My persistent (and not unfounded?) worry is that all the major tools & plugins (Cursor, Cline/Roo) all play games with their own sub-prompts and context “efficiency”.

What’s the purest solution?

Keyframe · 46m ago

Unlike OP, from my still limited but intense month or so diving into this topic so far, I had better luck with Gemini 2.5 PRO and Opus 4 on more abstract level like architecture etc. and then dealing input to Sonnet for coding. I found 2.5 PRO, and to a lesser degree Opus, were hit or miss; A lot of instances of them circling around the issue and correcting itself when coding (Gemini especially so), whereas Sonnet would cut to the chase, but needed explicit take on it to be efficient.

khaledh · 12m ago

This is my experience too. I usually use Gemini 2.5 Pro through AI Studio for big design ideas that need to be validated and refined. Then take the refined requirements to Claude Code which does an excellent job most of the time in coding them properly. Recently I tried Gemini CLI, and it's not even close to Claude Code's sharp coding skills. It often makes syntax mistakes, and get stuck trying to get itself out of a rut; its output is so verbose (and fast) that it's hard to follow what it's trying to do. Claude Code has a much better debugging capability.

Another contender in the "big idea" reasoning camp: DeepSeek R1. It's much slower, but most of the time it can analyze problems and get to the correct solution in one shot.

cheschire · 15m ago

I find agentic coding to be best when using one branch per conversation. Even if that conversation is only a single bugfix, branch it. Then do 2 or 3 iterations of that same conversation across multiple branches and choose the best result of the 3 and destroy the other two.

dcre · 39m ago

“Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.”

I agree with this, but this is why I use a CLI. You can pipe files instead of copying and pasting.

lmeyerov · 23m ago

Yeah it is also a bit of a shibboleth: vibes coding, when I'm productive for the 80% case with Claude code, is about the LLM cranking for 10-20min. I'm instructing & automating the LLM on how to do its own context management, vs artisanally making every little decision.

Ex: Implementing a spec, responding to my review comments, adding wider unit tests, running a role play for usability testing, etc. The main time we do what he describes of manually copying into a web ide is occasionally for a better short use of a model, like only at the beginning of some plan generation, or debug from a bunch of context we have done manually. Like we recently solved some nasty GPU code race this way, using a careful mix of logs and distributed code. Most of our job is using Boring Tools to write Boring Code, even if the topic/area is neato: you do not want your codebase to work like an adventure for everything, so we invest in making it look boring.

I agree the other commenter said: I manage context as part of the skill, but by making the AI do it. Doing that by hand is like slowly handcoding assembly. Instead, I'm telling Claude Code to do it. Ex: Download and crawl some new dependency I'm using for some tricky topic, or read in my prompt template markdown for some task, or generate and self-maintain some plan.md with high-level rules on context I defined. This is the 80% case.

Maybe one of the disconnects is task latency vs throughput as trade-offs in human attention. If I need the LLM to get to the right answer faster, so the task is done faster, I have to lean in more. But my time is valuable and I have a lot to do. If rather spend 50% less of my time per task, even if the task takes 4x longer, by the LLM spinning longer. In that saved human time, I can be working on another task: I typically have 2-3 terminals running Claude, so I only check in every 5-15min.

indigodaddy · 25m ago

Since I’ve heard Gemini-cli is not yet up to snuff, has anyone tried opencode+gemini? I’ve heard that with opencode you can login with Google account (have NOT confirmed this, but if anyone has any experience, pls advise) so not sure if that would get extra mileage from Gemini’s limits vs using a Gemini api key?

theodorewiles · 1h ago

My question on all of the “can’t work with big codebases” is how would a codebase that was designed for an LLM look like? Composed of many many small functions that can be composed together?

antirez · 1h ago

I believe it’s the same as for humans: different files implementing different parts of the system with good interfaces and sensible boundaries.

dkdcio · 33m ago

this is a common pattern I see -- if your codebase is confusing for LLMs, it's probably confusing for people too

exitb · 21m ago

And on top of that - can you steer an LLM to create this kind of code? In my experience the models don’t really have a „taste” for detecting complexity creep and reengineering for simplicity, in the same way an experienced human does.

Hasnep · 1h ago

And my question to that is how would that be different from a codebase designed for humans?

Keyframe · 58m ago

like a microservice architecture? overall architecture to get the context and then dive into a micro one?

qweiopqweiop · 1h ago

This matches my take, but I'm curious if OP has used Claude code.

antirez · 1h ago

Yep when I use agents I go for Claude Code. For example I needed to buy too many Commodore 64 than appropriate lately, and I let it code a Telegram bot advising me when popular sources would have interesting listings. It worked (after a few iterations) then I looked at the code base and wanted to puke but who cares in this case? It worked and it was much faster and I had zero to learn in the proces of doing it myself. I published a Telegram library for C in the past and know how it works and how to do scraping and so forth.

Keyframe · 52m ago

For example I needed to buy too many Commodore 64 than appropriate lately

Been there, done that!

for those one-off small things, LLMs are rather cool. Especially Cloude Code and Gemini CLI. I was given an archive of some really old movies recently, but files were bearing title names in Croatian instead of original (mostly English ones). So I claude --dangerously-skip-permissions into the directory with movies and in a two-sentence prompt I asked it to rename files into a given format (that I tend to have in my archive) and for each title to find original name and year or release and use it in the file.. but, before commiting rename to give me a list of before and after for approval. It took like what, a minute of writing a prompt.

Now, for larger things, I'm still exploring a way, an angle, what and how to do it. I've tried from yolo prompting to structured and uber structured approaches, all the way to mimicking product/prd - architecture - project management / tasks - developer/agents.. so far, unless it's rather simpler projects I don't see it's happening that way. Most luck I had was "some structure" as context and inputs and then guiding prompting during sessions and reviewing stuff. Almost pair-programming.

apwell23 · 1h ago

> Coding activities should be performed mostly with: Claude Opus 4

I've been going down to sonnet for coding over opus. maybe i am just writing dumb code

jtonl · 15m ago

Most of the time Sonnet 4 just works but need to refine context as much as you can.

stpedgwdgfhgdd · 30m ago

That is also what Anthropic recommends. In edge cases use Opus.

Opus is also way more expensive. (Don’t forget to switch back to Sonnet in all terminals)

apwell23 · 1h ago

> ## Provide large context

I thought large contexts are not necessarily better and sometimes have opposite effect ?

antirez · 59m ago

LLMs performance will suffer from both insufficient context and context flooding. Balancing is an art.

Perplexity's Comet is the AI browser Google wants (theverge.com)

L.A. Wildfire victims struggle to rebuild with natural materials (latimes.com)

Rust Clippy performance status update (blog.goose.love)

US signals intention to rethink job H-1B lottery (theregister.com)

Responsible AI in Enterprise Applications: A Practitioner's View (jjude.com)

Mitochondrial Origins of the Pressure to Sleep (nature.com)

How I Use Claude Code to Ship Like a Team of Five (every.to)

I'm Unsatisfied with Easing Functions (davepagurek.com)

Euler's Identity (pgadey.ca)

Differential Form of Gauss's Law (thinking-about-science.com)

Bus Bunching (futilitycloset.com)

How the Free Software Foundation battles the LLM bots (thenewstack.io)

Agent-dir: push A2A agent cards to Git, get a lightweight agent catalog (github.com)

I found a tool for comparing two lists, which has complete functions (list-difference.com)

Show HN: Generate bitmap fonts from custom images (calligro.ideasalmanac.com)

Kiro Dev Tools – Download (kirodotdev.com)

Show HN: Chat based AI form builder and calculator (minform.io)

Nine vibe coding tools on one page (justdoers.com)

The old Caveman Chemistry website (1996-2000) (cavemanchemistry.com)

The summer of flooding across the US, and scientists know why (cnn.com)

Trump Aides Discussed Ending Some SpaceX Contracts, but Found Most Were Vital (wsj.com)

Show HN: Launching ChessArena – open-source Chess Benchmark to evaluate LLMs (chessarena.ai)

How A Video Studio Embraced A.I. and Stormed the Internet (nytimes.com)

Vice owner Savage Ventures has requested removal of my Collective Shout articles (bsky.app)

XMLUI (blog.jonudell.net)

Open source expert system with verifiable answers (github.com)

A MAGA bot network on X is divided over the Trump-Epstein backlash (nbcnews.com)

Why Dictionaries Still Define Us (nytimes.com)

Why Don't Liquid Splash in a Vacuum (youtube.com)

When Millionaires Say They're Leaving–They Almost Never Do (forbes.com)

Can Software Be Durable?

ByteDance AI Empire: Inside the $12B Race Beyond the "For You" Page (algogist.com)

The AI Ghost in the Machine Fired Him. Then It Gave Him a New Life (gizmodo.com)

Kranzberg's six laws of technology, a metaphor, and a story (2011) (thefrailestthing.com)

Wood Construction System Based on Off-Cuts from the CLT and GLT Industry (mdpi.com)

Replit AI deletes entire database during code freeze, then lies about it (twitter.com)

Show HN: MCP that adds RAG for private repos

Show HN: Flowgen, effect inspired type-safe error management using generators (github.com)

Show HN: OctoMailer – universal Node.js lib for multiple email service providers (github.com)

Ask HN: How do you validate a product idea before building?

Why Tax Breaks for Data Centers Could Backfire on States (time.com)

Ask HN: US expats/nomads, how do you find remote-out-of-US jobs in US?

Teen develops IoT-based tree health monitoring system 'Trevive' (thestatesman.com)

Canned cocktail hitting you hard? As ready-to-drink cans grow in popularity (cbc.ca)

Moravec's Paradox (en.wikipedia.org)

Introduction to TOTP [video] (youtube.com)

The Internet is a Series of Webs (2024) (aramzs.xyz)

Floating Points' Towering Sunflower Sound System (ra.co)

ChatGPT Is Changing the Words We Use in Conversation (scientificamerican.com)

Six Months with the Supernote Nomad (taoofmac.com)

Coding with LLMs in the summer of 2025 – an update

Comments (38)