Show HN: Eyesite – Experimental website combining computer vision and web design (blog.andykhau.com)

121 points by akchro 1d ago 23 comments

GCP Outage (status.cloud.google.com)

1399 points by thanhhaimai 17h ago 486 comments

Show HN: RomM – An open-source, self-hosted ROM manager and player (github.com)

228 points by gassi 1d ago 93 comments

Agentic Coding Recommendations (lucumr.pocoo.org)

262 points by rednafi 1d ago 192 comments

DeskHog, an open-source developer toy (posthog.com)

264 points by constantinum 1d ago 105 comments

macOS Tahoe brings a new disk image format (eclecticlight.co)

361 points by zdw 19h ago 132 comments

2025 State of AI Code Quality

42 cliffly 50 6/12/2025, 12:59:56 PM qodo.ai ↗

Comments (50)

ilitirit · 21h ago

I currently have a big problem with AI-generated code and some of the junior devs on my team. Our execs keep pushing "vibe-coding" and agentic coding, but IMO these are just tools. And if you don't know how to use the tools effectively, you're still gonna generate bad code. One of the problems is that the devs don't realise why it's bad code.

As an example, I asked one of my devs to implement a batching process to reduce the number of database operations. He presented extremely robust, high-quality code and unit tests. The problem was that it was MASSIVE overkill.

AI generated a new service class, a background worker, several hundred lines of code in the main file. And entire unit test suites.

I rejected the PR and implemented the same functionality by adding two new methods and one extra field.

Now I often hear comments about AI can generate exactly what I want if I just use the correct prompts. OK, how do I explain that to a junior dev? How do they distinguish between "good" simple, and "bad" simple (or complex)? Furthermore, in my own experience, LLMs tend to pick up to pick up on key phrases or technologies, then builds it's own context about what it thinks you need (e.g. "Batching", "Kafka", "event-driven" etc). By the time you've refined your questions to the point where the LLM generate something that resembles what you've want, you realise that you've basically pseudo-coded the solution in your prompt - if you're lucky. More often than not the LLM responses just start degrading massively to the point where they become useless and you need to start over. This is also something that junior devs don't seem to understand.

I'm still bullish on AI-assisted coding (and AI in general), but I'm not a fan at all of the vibe/agentic coding push by IT execs.

vitaflo · 20h ago

It’s difficult to do the hard work of you haven’t done the easy work 10,000 times. And we tend to get paid for the hard work.

LLMs remove the easy work from the junior devs task pile. That will make it a lot more difficult for them to do the actual hard work required of a dev. They skipped the stepping stones and critical thinking phase of their careers.

Senior devs are senior because they’ve done the easy things so often it’s second nature.

mattgreenrocks · 20h ago

Fantastic insight. Without doing the easy work, we cannot recognize when it is appropriate, which is more often than not, in my experience. There's also a certain humility to doing things as simply as possible. LLMs lack the self-reflective capability to see that maybe an incremental change to use batching for a write-heavy op shouldn't be 200 lines of code.

Tools can't replace human understanding of a problem, and that understanding is the foundation for effective growth and maintenance of code.

mandevil · 20h ago

This is what my wife (a hospital pharmacist) thinks too. She has developed her medical intuition on thousands of patients, through education, rotations, residency, years of working, etc. And most patients are straightforward and easy. But it is those easy cases that build the judgement skills necessary to make her so fast and effective when she gets to the hard cases.

Maybe an AI would be better on the easy cases- slightly faster and cheaper. But it would mean that she would never develop the skills to tackle the problems that AI has no idea how to handle.

tails4e · 21h ago

Exactly this. If a junior dev is never exposed to the task of reasoning about code themselves, they never will know what the difference between good and bad code is. Code based will be littered with code that doe the job functionally, but is not good code, and technical debt will accumulate. Surely this can't be good for the junior Devs or the code bases long term?

diggan · 21h ago

To be fair, most startups already trade "works today" for "easier to add stuff in the future", even before LLMs. I'm sure we'll see a much harder turn to the "works today" direction (which current vibe-coding epidemic already seem to signal we're in), until the effects of that turn really starts to be felt (maybe 1-3 years), then we'll finally start to steer to maintainable and simple software again.

Filligree · 21h ago

I’m not so sure. In the short term, yes, we hear about disasters caused by developers choosing “works today” and the AI almost instantly making a mess…

But that’s the point. The feedback loop is faster; AI is much worse at coping with poor code than humans are, so you quickly learn to keep the codebase in top shape so the AI will keep working. Since you saved a lot of time while coding, you’re able to do that.

That doesn’t work for developers who don’t know what good code is, of course.

spicybbq · 20h ago

> then we'll finally start to steer to maintainable and simple software again.

I disagree. I expect that companies will try to overcome AI-generated technical debt by throwing more AI at the problem.

bluefirebrand · 20h ago

Already starting to see this attitude online among Pro-AI people

"If the code doesn't work just throw it away and vibe code new code to replace it"

It's something that is... Sort of possible I guess but it feels so shortsighted to me

Maybe I just need to try and adjust to a shortsighted world

hiq · 21h ago

> OK, how do I explain that to a junior dev?

They could iterate with their LLM and ask it to be more concise, to give alternative solutions, and use their judgement to choose the one they end up sending to you for review. Assuming of course that the LLM can come up with a solution similar to yours.

Still, in this case, it sounds like you were able to tell within 20s that their solution was too verbose. Declining the PR and mentioning this extra field, and leaving it up to them to implement the two functions (or equivalent) that you implemented yourself would have been fine maybe? Meaning that it was not really such a big waste of time? And in the process, your dev might have learned to use this tool better.

These tools are still new and keep evolving such that we don't have best practices yet in how to use them, but I'm sure we'll get there.

code_biologist · 20h ago

They'll improve, but current LLMs (o3, Gemini 2.5 Pro, lesser ones) are also terrible at suggesting non-obvious alternative solutions on their own. You can sometimes squeeze them, adding each obvious but poor approach to a list of "here are things I don't want you to propose" in turn, but often even then they won't get to the answer they should be able to find. The part that is wildly irritating is that once you tell them about the non-obvious simple solution, they act like it's the most natural thing in the world and they knew it the entire time.

Assuming of course that the LLM can come up with a solution similar to yours.

I have idle speculations as to why these things happen, but I think in many cases they can't actually. They also can't tell the junior devs that such a solution might exist if they just dig further. Both of these seem solvable, but it seems like "more, bigger models, probed more deeply" is the solution, and that's an expensive solution that dings the margins of LLM providers. I think LLM providers will keep their margins, providing models with notable gaps and flaws, and let software companies and junior devs sort it out on their own.

diggan · 21h ago

> Our execs keep pushing "vibe-coding"

Imagine if wat (https://www.destroyallsoftware.com/talks/wat) appeared on the internet, and execs took it serious and suddenly asked people to actually explicitly make everything into JS.

This is how it sounds when I hear executives pushing for things like "vibe-coding".

> More often than not the LLM responses just start degrading massively to the point where they become useless and you need to start over

Yeah, this is true. The trick is to never go beyond one response from the LLM. If they get it wrong, start over immediately with a rewritten prompt so they get it right on the first try. I'm treating "the LLM got it wrong" as "I didn't make the initial user/system prompt good enough", not as in "now I'm gonna add extra context to try to steer it right".

bee_rider · 20h ago

It would be kinda cool if we could write pseudocode on a whiteboard or a notebook and have the computer spit out a real program.

Viliam1234 · 18h ago

Easy, just use one AI to write the pseudocode, and another AI to change it into a real program.

And a third AI to review the pseudocode, I guess.

More seriously, I think that this is generally the correct approach: create a script that the AIs can follow one step at a time; update the script when necessary.

h3lp · 16h ago

I see an analogy to the discussions from my youth about compilers vs. assembly language programmers. It is still true that assembly is required to write high performance primitives, and that a competent assembly programmer will always beat a good compiler on a small function---but a compiler will consistently turn out decent and correct code for the entire project. So, basically, the compilers won, and assembly is relegated to be an important but niche skill.

lurking_swe · 10h ago

the word “consistently” in your anecdote is doing a lot of heavy lifting. Generative AI is anything but consistent. A compiler _is_, hence its immense value. Furthermore, even a junior programmer can compile their program and generate robust assembly code, not the case with generative AI.

It’s hard to predict how this plays out IMO. Especially since this industry (broadly speaking) doesn’t believe in training juniors anymore.

h1fra · 20h ago

wait a couple of years, the junior will still not know how to code and companies will need someone with experience to fix all the mess $$$

cies · 20h ago

I think this is where functional style and strong types come in handy: they make it harder to write bad code that looks innocent.

In part this is because the process of development leans less hard on the discipline of devs; humans. Code becomes more formal.

I regularly I have a piece of vibe-coded code in a strongly typed language, and it does not compile! (would that count as a hallucination?) I have thought many times: in Python/JS/Ruby this would just run, and only produce a runtime error in some weird case that likely only our customers on production will find...

diggan · 20h ago

> I think this is where functional style and strong types come in handy: they make it harder to write bad code that looks innocent.

I'm a proponent of functional programming in general, but I don't think neither types (of any "strength") nor functional programming makes it easier or harder to write bad code. Sure, types might help avoid easy syntax errors, but can also give the developer false confidence with "if it compiles it works :shrug:". Instruct the LLM to figure out the solution until it compiles, and you'll get the same false confidence, if there is nothing else asserting the correct behavior, not just the syntax.

> in Python/JS/Ruby this would just run

I'm not sure how well versed with dynamic languages you are, especially when writing code for others, but you'll in 99% cases cover at the very least all the happy paths with unit tests, and if you're planning on putting it in a production environment, you'll also do the "sad" paths. Using LLMs or not shouldn't change that very basic requirement.

jmsdnns · 21h ago

> 25% of developers 1 in 5 AI-generated suggestions estimate that contain factual errors or misleading code.

I cannot believe what's said in the report because it doesnt even reflect what my pro-AI coding friends say is true. Every dev I know says AI generated suggestions are often full of noise, even the pro-AI folks.

bluefirebrand · 21h ago

I think this really highlights the difference between "pro ai" and "anti ai" people

"It's full of noise but I'm confident I can cut through it to get to the good stuff" - Pro AI

"It's full of noise and it takes more effort to cut through than it would take to just build it myself" - Anti AI

I'm pretty Anti myself. I think "I can cut through the noise" is pretty misplaced overconfidence for a lot of devs

diggan · 20h ago

I don't think I would place myself on either sides, I guess I'm in the "AI is OK at some stuff" camp.

But if you're getting a lot of noise, I'd immediately try to adjust my system/user prompt to never get that noise in the first place. I'm currently using a variation of https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... which is basically my personal coding guidelines but "codified" as simple rules for LLMs to understand.

For anything besides the dumb models, I get code that more or less looks exactly like how I would have written it myself. When I find I get code back that I'm not happy with, I adjust the system/user prompt further so this time and the next it returns code like how I would have done it.

bluefirebrand · 19h ago

I feel I should clarify

When it comes to judging the quality of AI output, I do agree with "AI is ok at some stuff"

When I say I tend to fall on the Anti AI side, I am saying "But I still don't think it's worth using much"

I don't really want to lean on tools that are just ok at some stuff.

Viliam1234 · 18h ago

I basically only use AI for tasks that I could also do myself, because those are the tasks where I can find and fix bugs. When used like this, AI can save a lot of typing.

So I guess that puts me into "pro AI" camp, but it's not like we actually disagree.

bluefirebrand · 18h ago

That's fair

I don't really find that typing is my bottleneck mostly. AI saving me time spent typing code also just costs me time spent prompting and re-prompting the AI so... Kinda a wash mostly?

ben_w · 21h ago

Why does:

> 25% of developers estimate that 1 in 5 AI-generated suggestions contain factual errors or misleading code.

Seem incompatible with "often full of noise", to you?

I can't speak for factual errors, but I'd say less than 20% of the code ChatGPT* gives me contains clear errors — more like 10%. Perhaps that just means I can't spot all the subtle bugs.

But even in the best case, there's a lot of "noise" in the answers they give me: Excess comments that don't add anything, a whole class file when I wanted just a function, that kind of thing.

* Other LLMs are different, and I've had one (I think it was Phi-2) start bad then switch both task *and language* mid-way through.

andnand · 20h ago

I'd say for me, it depends on the task and the language. I find asking ChatGPT to generate some code that I copy and paste lines up with your experience. Same with using an agent in a new project. I find the error rate much higher though once I start asking it to write code using specific libraries. Or when using an agent in an established code base. It's also terrible with DSLs that probably don't have as much training data. Trying to get it to do anything with Azures KQL is borderline pointless.

jmsdnns · 20h ago

Because it is much higher than 25%

Kiro · 20h ago

Not my experience at all. 25% sounds really high. I can't even remember the last time it gave me an error that wasn't reasonable (e.g. based on incomplete information) and was just pure noise.

jmsdnns · 18h ago

fwiw, i dont mean to suggest AI is pure noise or even that AI isnt worth using. the report just doesnt reconcile with my experiences at all.

my experiences range from helping design penn's new AI degree programs, hearing from friends at algorithmic hedge funds, hearing from friends at startups, and my own development.

andnand · 20h ago

Im curious what types of tasks you're using it for?

elpocko · 20h ago

I wish LLMs were generally viewed as Eliza on steroids, a thing to generate plausible sounding text with, in places where we used primitive generators based on Markov models before. To implement smarter NPCs in games, and virtual chat partners to talk to, just for fun. They are, after all, really fun to play with. They should be used as smart autocomplete in your IDE, not to generate whole projects from scratch. As an idea generator when you're stuck.

This requirement to be commercially useful and valuable, and to aid all kinds of businesses everywhere, gave a bad reputation to what is otherwise an amazing technological achievement. I am an outspoken AI enthusiast, because it is fun and interesting, but I hate how it is only seen as useful when it can do actual work like a human.

wbharding · 20h ago

It's hard to reconcile how 59% of devs in their survey are "confident" AI is improving their code quality, with prior empirical research that shows a surge in added & copy/pasted lines w/ a corresponding drop in moved (refactored) lines https://www.gitclear.com/ai_assistant_code_quality_2025_rese...

My experience (using a mix of Copilot & Cursor through every day) is that AI has become very capable of solving problems of low-to-intermediate complexity. But it requires extreme discipline to vet the code afterward for the FUD and unnecessary artifacts that sneak in alongside the "essential" code. These extra artifacts/FUD are to my mind the core of what will make AI-generated code more difficult to maintain than human-authored code in the long-term.

hiq · 21h ago

I'd be interested in seeing comparisons between languages. I expect that a terse language with an expressive type system (is that Haskell maybe?) can lead to way better results in terms of usefulness than, say, bash, because I can rely on the type system and the compiler to have gotten rid of some basic mistakes, and I can read the code faster (since it's more concise).

I've mostly used LLMs with python so far and I'm looking forward to using them more with compiled languages where at least I won't have mismatching types a compiler would have detected without my help.

hippari2 · 21h ago

I think what really matters is how much code of that language is on StackOverflow :)

sathomasga · 21h ago

Survey from a company that's in the business of AI coding and thus has a monetary interest in promoting the technology. No details on who conducted the survey (the company itself?) or how the 609 respondents were selected. If limited to the company's own customers, massive selection bias. The results may or may not reflect reality, but this "report" is just marketing bullshit.

esafak · 21h ago

Lots of numbers. I'm interested in seeing the trends over time. I bet with their products they could track this daily.

diggan · 21h ago

> are 2.5x more likely to merge code without reviewing it

What the fuck? Are people taking "vibe coding" as a serious workflow? No wonder people's side projects feel more broken and buggy than before. Don't get me wrong, I "work with" LLMs, but I'd never merge/use any code that I didn't review, none of the models or tooling is mature enough for that.

Really strange how some people took a term that supposed to be a "lol watch this" and started using it for work...

dartos · 21h ago

> Really strange how some people took a term that supposed to be a "lol watch this" and started using it for work...

don't forget about the insane amount of marketing around AI code companies and how they put "vibe coding" in front of everyone's face all the time.

You tell someone something enough times and they'll belive it

uludag · 21h ago

Plus, vibe coding is the surest way for these companies to insert themselves as the ultimate middleman in the entire software development process.

As an aside, I really hate how cynical I feel I've been compelled to become at the arrival of such a genuinely innovative technology. Like, with this very article I can't help but to think there are ulterior motives behind it's production.

InitialLastName · 21h ago

This is key. They've found a way to sell extremely-high-brand-stickiness shovels in a gold rush.

bluefirebrand · 21h ago

> As an aside, I really hate how cynical I feel I've been compelled to become at the arrival of such a genuinely innovative technology

Yeah I'm feeling like this too. This should be so exciting! We're getting close to the Star Trek dream of just telling the computer to do work and it works!

I've been trying to examine why it's not exciting for me, and I'm actually pretty repulsed by it.

I think it's a combination of things

To start, I'm pretty disgusted by the blatant and unapologetic scraping of every single scrap of public data, regardless of license or copyrights.

I'm also really discouraged by how this is turning out to be another tool in the capitalist toolbox to justify layoffs, increase downward pressure on salaries, and once again extract more value per hour worked from employees

I also don't feel like the technology is actually that good or reliable yet. It has transformed my workflow but for the worse. Because my company is very bullish on AI it has resulted in me losing what little control I had to choose the tools that I feel are best for my job, in favor of what they want me to use because of the hype

Ultimately I'm cynical because I don't feel like this is making my life better. It feels like it is enriching other people at my expense and I am very bitter about it

orangebread · 19h ago

Not for nothing, but I did create an entire game in browser using phaser as the engine.

But I'm also an experienced developer and at this point, an experienced "vibe coder". I use that last term loosely because I have a structured set of rules I have AI follow.

To really understand AI's capability you have to have experienced it in a meaningful way with managed expectations. It's not going to nail what you want right away. This is also why I spend a lot of time up front to design my features before implementing.

diggan · 16h ago

> I use that last term loosely because I have a structured set of rules I have AI follow

Right, but what defines if what you're doing is "vibe-coding" or not is if you actually view the code it produces, at any point of the workflow. You're "vibe-coding" if you're merging/pushing without reviewing the code.

I'm also an experienced developer, and used LLMs a lot, but never pushed/merged anything into production that I haven't read and understood myself.

msgodel · 21h ago

I've done it for "nice to have" features in their own modules that I don't really care about and aren't consumed by anything else (recently an SVG plot generator for a program I wrote.) The LLM one-shotted it and I left it alone for a long time. Stuff like that is great application for literal vibe coding.

I can't imagine doing it for anything serious though.

diggan · 21h ago

Yeah, for one-off, never-to-be-touched again I guess that kind of makes sense.

But this survey seems to span much more than just one-off tiny things, and gives the impression people working as professionals in companies are actually doing "vibe-coding" not as a joke, but as a workflow, for putting software into production.

mattgreenrocks · 20h ago

As awful as it is, it is entirely understandable: it follows naturally from the claims that LLMs can replace programmers entirely.

As capable as the models are, what matters more is how competent they are perceived to be, and how that is socialized. The hype machine is at deafening levels currently.

jasonthorsness · 21h ago

The absolute number who merged without reviewing was only 24% so maybe there is still hope!

simonw · 21h ago

1/4 programmers in a survey merging code without reviewing it is terrifying to me. That number should be as close to 0% as possible.

namanyayg · 21h ago

> "65% of developers using AI for refactoring and ~60% for testing, writing, or reviewing say the assistant “misses relevant context."

> "Among those who feel AI degrades quality, 44% blame missing context; yet even among quality champions, 53% still want context improvements."

Is this even true anymore? Doesn't happen to me with claude 4 + claude code.

Jemalloc Postmortem (jasone.github.io)

If the moon were only 1 pixel: A tediously accurate solar system model (2014) (joshworth.com)

Frequent reauth doesn't make you more secure (tailscale.com)

Rendering Crispy Text on the GPU (osor.io)

A receipt printer cured my procrastination (laurieherault.com)

A Dark Adtech Empire Fed by Fake CAPTCHAs (krebsonsecurity.com)

iPhone 11 emulation done in QEMU (github.com)

Slow and steady, this poem will win your heart (nytimes.com)

Show HN: Tritium – The Legal IDE in Rust (tritium.legal)

Three Algorithms for YSH Syntax Highlighting (github.com)

Maximizing Battery Storage Profits via High-Frequency Intraday Trading (arxiv.org)

Rust compiler performance (kobzol.github.io)

Show HN: Tool-Assisted Speedrunning the Boring Parts of Animal Crossing (GCN) (github.com)

Why does my ripped CD have messed up track names? And why is one track missing? (akpain.net)

Show HN: McWig – A modal, Vim-like text editor written in Go (github.com)

Urban Design and Adaptive Reuse in North Korea, Japan, and Singapore (governance.fyi)

Chatterbox TTS (github.com)

Microsoft Office migration from Source Depot to Git (danielsada.tech)

Worldwide power grid with glass insulated HVDC cables (omattos.com)

Solving LinkedIn Queens with SMT (buttondown.com)

The curse of Toumaï: an ancient skull and a bitter feud over humanity's origins (theguardian.com)

Dancing brainwaves: How sound reshapes your brain networks in real time (sciencedaily.com)

First thoughts on o3 pro (latent.space)

Helion: A modern fast paced Doom FPS engine in C# (github.com)

Quantum Computation Lecture Notes (2022) (math.mit.edu)

Show HN: I wrote a BitTorrent Client from scratch (github.com)

Show HN: GetHooky – a language-agnostic Git hook manager (ezpieco.github.io)

US-backed Israeli company's spyware used to target European journalists (apnews.com)

Roundtable (YC S23) Is Hiring a President / CRO (ycombinator.com)

Show HN: Spark, An advanced 3D Gaussian Splatting renderer for Three.js (sparkjs.dev)

Research suggests Big Bang may have taken place inside a black hole (port.ac.uk)

V-JEPA 2 world model and new benchmarks for physical reasoning (ai.meta.com)

Reflections on Sudoku, or the Impossibility of Systematizing Thought (rjp.io)

The Case for Software Craftsmanship in the Era of Vibes (zed.dev)

EM Eavesdropping Attack on Digital Microphones Using Pulse Density Modulation (usenix.org)

Seedance 1.0 (seed.bytedance.com)

Bypassing GitHub Actions policies in the dumbest way possible (blog.yossarian.net)

The hunt for Marie Curie's radioactive fingerprints in Paris (bbc.com)

Major sugar substitute found to impair brain blood vessel cell function (medicalxpress.com)

xAI accused of pollution over Memphis supercomputer (politico.com)

Archaeological evidence of intensive indigenous farming in MI's Upper Peninsula (science.org)

Researchers confirm two journalists were hacked with Paragon spyware (techcrunch.com)

WebKit Standards Positions (webkit.org)

Rohde and Schwarz AMIQ Modulation Generator Teardown (tomverbeure.github.io)

Show HN: Eyesite – Experimental website combining computer vision and web design (blog.andykhau.com)

GCP Outage (status.cloud.google.com)

Show HN: RomM – An open-source, self-hosted ROM manager and player (github.com)

Agentic Coding Recommendations (lucumr.pocoo.org)

DeskHog, an open-source developer toy (posthog.com)

macOS Tahoe brings a new disk image format (eclecticlight.co)

2025 State of AI Code Quality

Comments (50)