LLM Inevitabilism (tomrenner.com)

I wish for a vetting tool. Have an LLM examine the code then write a spec of what it reads and writes, & you can examine that before running it. If something in the list is suspect.. you’ll know before you’re hosed not after :)

nothrabannosir · 3h ago

Throwing more llm at a prompt escaper is like throwing more regexp at a html parser.

If the first llm wasn’t enough, the second won’t be either. You’re in the wrong layer.

scroogey · 1h ago

Here's an alternative perspective: https://x.com/rauchg/status/1949197451900158444

Not a professional developer (though Guillermo certainly is) so take this with a huge grain of salt, but I like the idea of an AI "trained" on security vulnerabilities as a second, third and fourth set of eyes!

ffsm8 · 1h ago

I'm not sure how to take that seriously with the current reality where almost all security findings by LLM tools are false positives

While I suspect that's gonna work good enough on synthetic examples for naive and uninformed people to get tricked into trusting it... At the very least, current LLMs are unable to provide enough stability for this to be useful.

It might become viable with future models, but there is little value in discussing this approach currently. At least until someone actually made a PoC thats at least somewhat working as designed, without having a 50-100% false positive quota.

You can have some false positives, but it has to be low enough for people to still listen to it, which currently isn't the case.

adastra22 · 13m ago

Put it in a docker instance with a mounted git worktree?

troupo · 13m ago

> All these new tools are so exciting,

Most of these tools are not that exciting. These are similar-looking TUIs around third-paty models/LLM calls.

What is the difference between this, and https://opencode.ai? Or any of the half a dozen tools that appeared on HN in the past few weeks?

lionkor · 4h ago

that's cool and all, before you get malicious code that includes prompt injections and code that never runs but looks super legit.

LLMs are NOT THOROUGH. Not even remotely. I don't understand how anyone can use LLMs and not see this instantly. I have yet to see an LLM get a better failure rate than around 50% in the real world with real world expectations.

Especially with code review, LLMs catch some things, miss a lot of things, and get a lot of things completely and utterly wrong. It takes someone wholly incompetent at code review to look at an LLM review and go "perfect!".

Edit: Feel free to write a comment if you disagree

stpedgwdgfhgdd · 3h ago

My suggestion is to try CC, use a language like Go, and read their blogs how they use it internally. They are transparent what works and what does not work.

resonious · 4h ago

If you know that LLMs are not thorough going into it, then you can get your failure rates way lower than 50%. Of course if you just paste a product spec into an LLM, it will do a bad job.

If you build an intuition for what kinds of asks an LLM (agent, really) can do well, you can choose to only give it those tasks, and that's where the huge speedups come from.

Don't know what to do about prompt injection, really. But "untrusted code" in the broader sense has always been a risk. If I download and use a library, the author already has free reign of my computer - they don't even need to think about messing with my LLM assistant.

nxobject · 1h ago

Unfortunately, I haven’t been able to use this with many of the recent open weight code/instruct models - CC tool use doesn’t work with Qwen3 and Kimi K2 for me.

crocowhile · 4h ago

This is what got me started with claude-code. I gave it a try using openrouter API and got a bill of $40 for 2-3 hours of work. At that point, subscription to the Anthropic plan became a no-brainer

blitzar · 2h ago

What is the secret sauce of Claude Code that makes it, somewhat irrespective of the backend LLM, better than the competition?

Is it just better prompting? Better tooling?

CuriouslyC · 1m ago

The agentic instructions just seem to be better. It does stuff by default (such as working up a plan of action) that other agents need to be prompted for, and it seems to get stuck less in failure sinks. The actual Claude model is decent, but claude code is probably the best agentic tool out there right now.

ethan_smith · 23m ago

Claude's edge comes from its superior context handling (up to 200K tokens), better tool use capabilities, and constitutional AI training that reduces hallucinations in code generation.

EnPissant · 6h ago

Claude Code with a plan is so much cheaper than any API.

sylware · 2h ago

It is a bit off-topic here, but anybody tried to use such LLMs for code porting: from c++ (and similar) to plain C99+?

slhck · 1h ago

Yeah, look at what https://x.com/badlogicgames has done porting an engine with the help of Claude Code. He's set up a TODO loop to perform this: https://github.com/badlogic/claude-commands – background blog article: https://mariozechner.at/posts/2025-06-02-prompts-are-code/

sylware · 23m ago

Mariosechner post looks very promising.

We may finally get to the devs doing lock-in using ultra complex syntax languages in a much more efficient way using LLMs.

I have already some ideas for some target c++ code to port to C99+.

margarina72 · 6h ago

Feels very similar to Aider[1]

1: https://aider.chat/

KronisLV · 4h ago

There’s also RooCode which is pretty nice: https://marketplace.visualstudio.com/items?itemName=RooVeter... (fork of Cline, that one’s also good)

Ofc some might prefer the pure CLI experience, but mentioning that because it also supports a lot of providers.

firemelt · 5h ago

btw do you have javascript's stack background?

LLM Inevitabilism (tomrenner.com)

Do not download the app, use the website (idiallo.com)

Kiro: A new agentic IDE (kiro.dev)

CARA – High precision robot dog using rope (aaedmusa.com)

Show HN: Tinder but it's only pictures of my wife and I can only swipe right (trytender.app)

Linux Reaches 5% Desktop Market Share in USA (ostechnix.com)

Valve confirms credit card companies pressured it to delist certain adult games (pcgamer.com)

Hyatt Hotels are using algorithmic Rest “smoking detectors” (twitter.com)

Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork (github.com)

Global hack on Microsoft Sharepoint hits U.S., state agencies, researchers say (washingtonpost.com)

How to Firefox (kau.sh)

EU age verification app to ban any Android system not licensed by Google (reddit.com)

Dumb Pipe (dumbpipe.dev)

Reflections on OpenAI (calv.info)

Qwen3-Coder: Agentic coding in the world (qwenlm.github.io)

AI overviews cause massive drop in search clicks (arstechnica.com)

Graphene OS: a security-enhanced Android build (lwn.net)

It's time for modern CSS to kill the SPA (jonoalderson.com)

Ukrainian hackers destroyed the IT infrastructure of Russian drone manufacturer (prm.ua)

ChatGPT agent: bridging research and action (openai.com)

Mistral Releases Deep Research, Voice, Projects in Le Chat (mistral.ai)

Windsurf employee #2: I was given a payout of only 1% what my shares where worth (twitter.com)

Cops say criminals use a Google Pixel with GrapheneOS – I say that's freedom (androidauthority.com)

The United States withdraws from UNESCO (state.gov)

XMLUI (blog.jonudell.net)

TrackWeight: Turn your MacBook's trackpad into a digital weighing scale (github.com)

Ozzy Osbourne has died (bbc.co.uk)

My Self-Hosting Setup (codecaptured.com)

Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RL (matthieulc.com)

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope (wired.com)

Coding with LLMs in the summer of 2025 – an update (antirez.com)

Rust running on every GPU (rust-gpu.github.io)

Oakland cops gave ICE license plate data; SFPD also illegally shared with feds (sfstandard.com)

Cloudflare 1.1.1.1 Incident on July 14, 2025 (blog.cloudflare.com)

Death by AI (davebarry.substack.com)

Tom Lehrer has died (nytimes.com)

You can now disable all AI features in Zed (zed.dev)

New colors without shooting lasers into your eyes (dynomight.net)

Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic (github.com)

UK backing down on Apple encryption backdoor after pressure from US (arstechnica.com)

Ask HN: Is it time to fork HN into AI/LLM and "Everything else/other?"

Women dating safety app 'Tea' breached, users' IDs posted to 4chan (404media.co)

Apple's MLX adding CUDA support (github.com)

Data brokers are selling flight information to CBP and ICE (eff.org)

AccountingBench: Evaluating LLMs on real long-horizon business tasks (accounting.penrose.com)

Gemini with Deep Think achieves gold-medal standard at the IMO (deepmind.google)

Nobody knows how to build with AI yet (worksonmymachine.substack.com)

Electric cars produce less brake dust pollution than combustion-engine cars (modernengineeringmarvels.com)

Ex-Waymo engineers launch Bedrock Robotics to automate construction (techcrunch.com)

Uv: Running a script with dependencies (docs.astral.sh)

Claude Code Router

Comments (21)