GPT-5 vs. Sonnet: Complex Agentic Coding (elite-ai-assisted-coding.dev)

This should have been compared with Opus... I know OP says he didn't because of cost but if you're comparing who is better then you need to compare the best to the best... if Claude Opus 4.1 is significantly better than GPT 5 then that could offset the extra expense. Not saying it will... but forget cost if we want to compare solely the quality

arcticfox · 33m ago

> Note that Claude 4 Sonnet isn’t the strongest model from Anthropic’s Claude series. Claude Opus is their most capable model for coding, but it seemed inappropriate to compare it with GPT-5 because it costs 10 times as much.

Well - I would have been interested in GPT-5 vs. Opus. Claude Code Max is affordable with Opus.

swader999 · 29m ago

You're absolutely right!

macawfish · 22m ago

Claude is just so well rounded and considerate. A lot of this probably comes down to prompt and context engineering, though surely there's something magical about Anthropic's principled training methodologies. They invented constitutional AI and I can only imagine that behind the scenes they're doing really cool stuff. Can't wait to see Claude 5!

anotheryou · 1m ago

I think we need to stop testing models raw.

Claude is trained for claude code and that's how it's used in the field too.

stitched2gethr · 22m ago

This take rings true for me after admittedly only a couple of hours of use of gpt-5. I had an issue I had been working with Claude on but it was difficult to give it real-time feedback so it floundered. gpt-5 struggled in the same areas but after about $2 of tokens it did fix the issue. It was far from a 1 shot like I might have expected from the hype, but it did get the job in about an hour done where Claude could not in 3.

For reference my Claude usage was mostly Sonnet, but with consulting from Opus.

0xfaded · 18m ago

Would you be comfortable sharing a brief description of what the issue was?

carterparks · 28m ago

I'm getting an SSL error in Chrome: ERR_SSL_PROTOCOL_ERROR

OJFord · 17m ago

I get 'unable to connect' in Firefox Android for this and many little blogs on HN lately, idk what's going on. Cloudflare blocking me (but not for all sites)? Geo-restriction (UK)?

No comments yet

indigodaddy · 20m ago

What does the 1x and .33x mean on the list of models in copilot? (Never used but thinking about trying on the free tier)

commandar · 8m ago

They're multipliers against your quota of requests. GPT-4.1 is "free" with a copilot sub, and then the premium models would burn credits against a multiplier. So higher multipliers count more against your monthly quota.

GPT5, Sonnet 4, and Gemini Pro 2.5 are all 1x. Opus is 10x, for comparison.

https://docs.github.com/en/copilot/reference/ai-models/suppo...

Also worth keeping in mind that Copilot has reduced context windows even for the premium models, which has a very real impact on agentic performance.

SV_BubbleTime · 28m ago

> but when I'd point out the missing implementation, it would give its usual "you're absolutely right" and try to fix it.

I really trying to not be annoyed by Claude’s “You’re absolutely right” because I know I cannot control it but this is an increasingly difficult task.

jpalawaga · 10m ago

I think it's because "you're right!" somehow presupposes it knew the answer and was just testing you.

an intern never says that. they say "oh, I see."

bn-l · 20m ago

Github copilot is utter garbage. The diffing crawls along at a snail’s pace. I think it’s coming up on two years and this must criticised aspect of it still isn’t fixed—-even with all the reverse engineering of how cursor did it. I wish I could find an alternative to cursor (which has other issues). Honestly, that company just threw away a golden opportunity as the first mover.

sourcecodeplz · 17m ago

Why did they throw it away? Because of the new opaque pricing?

swader999 · 11m ago

They let their moat dry right up.

GPT-5 vs. Sonnet: Complex Agentic Coding (elite-ai-assisted-coding.dev)

Ultrathin business card runs a fluid simulation (github.com)

AWS's sudden removal of a 10-year account and all of its data: lessons learned (suramya.com)

AI must RTFM: Why tech writers are becoming context curators (passo.uno)

HorizonDB, a geocoding engine in Rust that replaces Elasticsearch (radar.com)

Google's Genie is more impressive than GPT5 (theahura.substack.com)

Astronomy Photographer of the Year 2025 shortlist (rmg.co.uk)

Apple's history is hiding in a Mac font (spacebar.news)

Tor: How a Military Project Became a Lifeline for Privacy (thereader.mitpress.mit.edu)

Window Activation (blog.broulik.de)

Linear sent me down a local-first rabbit hole (bytemash.net)

AI is impressive because we've failed at personal computing (rakhim.exotext.com)

Food, housing, & health care costs are a source of major stress for many people (apnorc.org)

Show HN: Synchrotron, a real-time DSP engine in pure Python (synchrotron.thatother.dev)

Telefon Hírmondó: Listen to news and music electronically, in 1893 (en.wikipedia.org)

Show HN: Trayce – “Burp Suite for developers” (trayce.dev)

How Attention Sinks Keep Language Models Stable (hanlab.mit.edu)

Flipper Zero dark web firmware bypasses rolling code security (rtl-sdr.com)

GPT-5 (openai.com)

Historical Tech Tree (historicaltechtree.com)

Show HN: Aha Domain Search (ahadomainsearch.com)

Exit Tax: Leave Germany before your business gets big (eidel.io)

The Rise of Ritual Features: Why Platforms Are Adding Daily Puzzle Games (productpickle.online)

Cursor CLI (cursor.com)

GPT-5: Key characteristics, pricing and system card (simonwillison.net)

Open SWE by LangChain (swe.langchain.com)

OpenAI's new open-source model is basically Phi-5 (seangoedecke.com)

FLUX.1-Krea and the Rise of Opinionated Models (dbreunig.com)

GPT-5 for Developers (openai.com)

What Is Popover=Hint? (una.im)

Virtual Linux Devices on ARM64 (underjord.io)

Getting Good Results from Claude Code (dzombak.com)

Docker for Developers: Essential Commands in One Cheatsheet (jsdev.space)

Foundry (YC F24) is hiring staff-level product engineers (ycombinator.com)

A love letter to my future employer (2020) (catzkorn.dev)

Turn any website into an API (parse.bot)

Encryption made for police and military radios may be easily cracked (wired.com)

Building Bluesky comments for my blog (natalie.sh)

Claude Code IDE integration for Emacs (github.com)

Writing a storage engine for Postgres: An in-memory table access method (2023) (notes.eatonphil.com)

Achieving 10,000x training data reduction with high-fidelity labels (research.google)

Cursed Knowledge (immich.app)

How the Housing Market for Young People Became 'A Total Disaster' (derekthompson.org)

Benchmark Framework Desktop Mainboard and 4-node cluster (github.com)

Lowe's and Home Depot are sharing customer data with law enforcement (flowingdata.com)

You don't need GPT-5 to control your computer on Linux. 100% privacy (grigio.org)

'Stagflation is coming to the U.S.' (morningstar.com)

How to sell if your user is not the buyer (writings.founderlabs.io)

Clear Thinking (read.perspectiveship.com)

GPT-5 streaming requires submission of biometric data

GPT-5 vs. Sonnet: Complex Agentic Coding

Comments (16)