AI slows down open source developers. Peter Naur can teach us why

316 jwhiles 196 7/14/2025, 2:32:08 PM johnwhiles.com ↗

Comments (196)

narush · 7h ago
Hey HN -- study author here! (See previous thread on the paper here [1].)

I think this blog post is an interesting take on one specific factor that is likely contributing to slowdown. We discuss this in the paper [2] in the section "Implicit repository context (C.1.5)" -- check it out if you want to see some developer quotes about this factor.

> This is why AI coding tools, as they exist today, will generally slow someone down if they know what they are doing, and are working on a project that they understand.

I made this point in the other thread discussing the study, but in general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the full factors table on page 11).

> If there are no takers then I might try experimenting on myself.

This sounds super cool! I'd be very excited to see how you set this up + how it turns out... please do shoot me an email (in the paper) if you do this!

> AI slows down open source developers. Peter Naur can teach us why

Nit: I appreciate how hard it is to write short titles summarizing the paper (the graph title is the best I was able to do after a lot of trying) -- but I might have written this "Early-2025 AI slows down experienced open-source developers. Peter Naur can give us more context about one specific factor." It's admittedly less of a catchy-title, but I think getting the qualifications right are really important!

Thanks again for the sweet write-up! I'll hang around in the comments today as well.

[1] https://news.ycombinator.com/item?id=44522772

[2] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

seanwilson · 6h ago
If this makes sense, how is the study able to give a reasonable measure of how long an issue/task should have taken, vs how long it took with AI to determine that using AI was slower?

Or it's comparing how long the dev thought it should take with AI vs how long it actually took, which now includes the dev's guess of how AI impacts their productivity?

When it's hard to estimate how difficult an issue should be to complete, how does the study account for this? What percent speed up or slow down would be noise due to estimates being difficult?

I do appreciate that this stuff is very hard to measure.

krona · 4h ago
An easier way to think about it might be if you timed how long it took each ticket in your backlog. You also recorded whether you were drunk or not when you worked on it, and the ticket was selected at random from your backlog. The assumption (null-hypothesis) is that being drunk has no effect on ticket completion time.

Using the magic of statistics, if you have completed enough tickets, we can determine whether the null-hypothesis holds (for a given level of statistical certainty), and if it doesn't, low large is the difference (with a margin of error).

That's not to say there couldn't be other causes for the difference (if there is one), but that's how science proceeds, generally.

jiggawatts · 1h ago
The challenge with “controlled experiments” is that saying to developers to “use AI for all of your tickets for a month” forces a specific tool onto problems that may not benefit from that tool.
msgodel · 1h ago
Most corporate software problems don't need AI at all. They're really coordination/communication/administration problems hiding as technical problems.
jwhiles · 4h ago
Thanks for the response, and apologies for misrepresenting your results somewhat! I'm probably not going to change the title since I am at heart and polemicist and a sloppy thinker, but I'll update the article to call out this misrepresentation.

That said, I think that what I wrote more or less encompasses three of the factors you call out as being likely to contribute: "High developer familiarity with reposito- ries", "Large and complex repositories", and "Implicit repository context".

I thought more about experimenting on myself, and while I hope to do it - I think it will be very hard to create a controlled enviornment whilst also responding to the demands the job puts on me. I also don't have the luxury of a list of well scoped tasks that could feasibly be completed in a few hours.

karmakaze · 56m ago
I would expect any change to an optimized workflow (developing own well understood project) to initially be slower. What I'd like to see is how these same developers do 6 months or a year from now after using AI has become the natural workflow on these same projects. The article mentions that these results don't extrapolate to other devs, but it's important to note that it may not extrapolate over time to these same devs.

I myself am just getting started and I can see how so many things can be scripted with AI that would be very difficult to (semi-)automate without. You gotta ask yourself "Is it worth the time?"[0]

[0] https://xkcd.com/1205/

calf · 1h ago
Slowing down isn't necessarily bad, maybe slow programming (literate/Knuth comes to mind as another early argument) encourages better theory formation. Maybe programming today is like fast food, and proper theory and abstraction (and language design) requires a good measure of slow and deliberate work that has not been the norm in industry.
antonvs · 6h ago
> Early-2025 AI slows down experienced open-source developers.

Even that's too general, because it'll depend on what the task is. It's not as if open source developers in general never work on tasks where AI could save time.

narush · 6h ago
We call this over-generalization out specifically in the "We do not provide evidence that:" table in the blog post and paper - I agree there are tasks these developers are likely sped up on with early-2025 tools.
2muchcoffeeman · 3h ago
I think this will be the key. Finding appropriate tasks. Even on code bases I know, I can find tedious things for the AI to do. Sometimes I can find tedious things for it to do that I would never have dreamt of doing in the past. Now, I think “will it do it?”.

Once I got a hang of identifying problems, or being more targeted, I was spending less time messing about and got things done quicker.

munificent · 6h ago
> The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself, probably applies to many other forms of human endeavour, and explains things as varied as why so many people think that AI has made them 10 times more productive, why I continue to use Vim, why people drive in London etc.

In boating, there's a notion of a "set and drift" which describes how wind and current pushes a boat off course. If a mariner isn't careful, they'll end up far from their destination because of it.

This is because when you're sitting in a boat, your perception of motion is relative and local. You feel the breeze on your face, and you see how the boat cuts through the surrounding water. You interpret that as motion towards your destination, but it can equally consist of wind and current where the medium itself is moving.

I think a similar effect explains all of these. Our perception of "making progress" is mostly a sense of motion and "stuff happening" in our immediate vicinity. It's not based on a perception of the goal getting closer, which is much harder to measure and develop an intuition for.

So people tend to choose strategies that make them feel like they're making progress even if it's not the most effective strategy. I think this is why people often take "shortcuts" when driving that are actually longer. All of the twists and turns keep them busy and make them feel like they're making more progress than zoning out on a boring interstate does.

wrsh07 · 5h ago
Something I noticed early on when using AI tools was that it was great because I didn't get blocked. Somehow, I always wanted to keep going and always felt like I could keep going.

The problem, of course, is that one might thoughtlessly invoke the ai tool when it would be faster to make the one line change directly

Edit

This could make sense with the driving analogy. If the road I was planning to take is closed, gps will happily tell me to try something else. But if that fails too, it might go back to the original suggestion.

thinkingemote · 5h ago
Exactly! Waze the navigation app tends to route users on longer routes but which feels more fast. When driving we perceive our journey as fast or slow not by the actual length but by our memories of what happened. Waze knows human drivers are happier with driving a route that may be longer in time and distance of they feel like they are making progress with the twists and turns.

Ai tools makes programming feel easier. That it might be actually less productive is interesting but we humans prefer the easier shortcuts. Our memories of coding with AI tells us that we didn't struggle and therefore we made progress.

tjr · 5h ago
That sounds like a navigation tool that I absolutely do not want! Occasionally I do enjoy meandering around, but usually fastest / shortest path would be preferred.

And I'm not sure about the other either. In my 20+ year career in aerospace software, the most memorable times were solving interesting problems, not days with no struggle just churning out code.

thinkingemote · 4h ago
Indeed it is removing the memorable events of achievement!

Generally memorable things are different than unmemorable things. Work is unmemorable. Driving is unmemorable except when something negative happens. Waze tries to give some positive feelings to the driving route. Waze knows that people want positive experiences sometimes more than efficiency.

Being stuck in a traffic jam is more memorable than not being so. Or we remember the negative feeling more than the fact that our drive actually wasn't inefficient.

AI tools makes us have a less negative day of work. so we feel like we have no traffic jams. "I got so much done" really means "I didn't get stuck". But it's also removing the positive feelings too!

It's an illusion of progress through our feelings and memories.

Or programming with AI brings different feedback mechanisms and systems and different emotional engagements and different memory behaviours. It's very interesting!

Alex_L_Wood · 5h ago
We all as humans are hardwired to prefer greedy algorithms, basically.
jiggawatts · 1h ago
> The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself

Linux/UNIX users are convinced of the superiority of keyboard control and CLI tools, but studies have shown that the mouse is faster for almost all common tasks.

Keyboard input feels faster because there are more actions per second.

mhuffman · 1h ago
>but studies have shown that the mouse is faster for almost all common tasks.

Do you think that daily CLI Linux/UNIX users might have a different list of what they consider "common tasks"?

PicassoCTs · 4h ago
I also think that AI written code- is just not read. People hate code-reviews, and actively refuse to read code- because that is hard work, reading into other peoples thoughts and ideas.

This is why pushing for new code, rewrites, new frameworks is so popular. https://www.joelonsoftware.com/2000/04/06/things-you-should-...

So a ton of ai-generated code- is just that, never read. Its generated, tested against test-functions - and thats it. I wouldn't wonder, if some of these devs themselves have only marginal ideas whats in there codebases and why.

tjr · 4h ago
I have mostly worked in aerospace software, and find this rather horrifying. I suppose, if your tests are in fact good and comprehensive enough, there could be a logical argument for not needing to understand the code, but if we're talking people's safety in the hands of your software, I don't know if there is any number of tests I would accept in exchange for willingly giving up understanding of the code.
asadotzler · 3h ago
You're transferring the need to be really good at coding and understanding code to the need to be really good at testing and understanding tests, which 9/10 times requires being good at coding and understanding code. There are no free lunches.
blake1 · 6h ago
I think a reasonable summary of the study referenced is that: "AI creates the perception of productivity enhancements far beyond the reality."

Even within the study, there were some participants who saw mild improvements to productivity, but most had a significant drop in productivity. This thread is now full of people telling their story about huge productivity gains they made with AI, but none of the comments contend with the central insight of this study: that these productivity gains are illusions. AI is a product designed to make you value the product.

In matters of personal value, perception is reality, no question. Anyone relying heavily on AI should really be worried that it is mostly a tool for warping their self-perception, one that creates dependency and a false sense of accomplishment. After all, it speaks a highly optimized stream of tokens at you, and you really have to wonder what the optimization goal was.

thinkingemote · 3h ago
It's like the difference between being fast and quick. AI tools make the developer feel quick but they may not be fast. It's less cognitive effort in some ways. It's an interesting illusion, one that is based on changing emotions from different feedback loops and the effects of how memory forms.
asadotzler · 3h ago
Quickness is a burst; speed is a flow.

Or, "slow is smooth, and smooth is fast"

BriggyDwiggs42 · 5h ago
I’ve noticed that you can definitely use them to help you learn something, but that your understanding tends to be more abstract and LLM-like that way. You definitely want to mix it up when learning too.
daxfohl · 5h ago
I've also had bad results with hallucinations there. I was trying to learn more about multi-dimensional qubit algorithms, and spent a whole day learning a bunch of stuff that was fascinating but plain wrong. I only figured out it was wrong at the end of the day when I tried to do a simulation and the results weren't consistent.

Early in the chat it substituted a `-1` for an `i`, and everything that followed was garbage. There were also some errors that I spotted real-time and got it to correct itself.

But yeah, IDK, it presents itself so confidently and "knows" so much and is so easy to use, that it's hard not to try to use as a reference / teacher. But it's also quite dangerous if you're not confirming things; it can send you down incorrect paths and waste a ton of time. I haven't decided whether the cost is worth the benefit or not.

Presumably they'll get better at this over time, so in the long run (probably no more than a year) it'll likely easily exceed the ROI breakeven point, but for now, you do have to remain vigilant.

tonyedgecombe · 5h ago
I keep wondering whether the best way to use these tools is to do the work yourself then ask the AI to critique it, to find the bugs, optimisations or missing features.
nico · 7h ago
> They are experienced open source developers, working on their own projects

I just started working on a 3-month old codebase written by someone else, in a framework and architecture I had never used before

Within a couple hours, with the help of Claude Code, I had already created a really nice system to replicate data from staging to local development. Something I had built before in other projects, and I new that manually it would take me a full day or two, especially without experience in the architecture

That immediately sped up my development even more, as now I had better data to test things locally

Then a couple hours later, I had already pushed my first PR. All code following the proper coding style and practices of the existing project and the framework. That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

So sure, AI won’t speed everyone or everything up. But at least in this one case, it gave me a huge boost

As I keep going, I expect things to slow down a bit, as the complexity of the project grows. However, it’s also given me the chance to get an amazing jumpstart

Vegenoid · 7h ago
I have had similar experiences as you, but this is not the kind of work that the study is talking about:

“When open source developers working in codebases that they are deeply familiar with use AI tools to complete a task, they take longer to complete that task”

I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

Navarr · 6h ago
> I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

If you are unfamiliar with the project, how do you determine that it wasn't leading you astray in the first place? Do you ever revisit what you had done with AI previously to make sure that, once you know your way around, it was doing it the right way?

Vegenoid · 6h ago
In some cases, I have not revisited, as I was happy to simply make a small modification for my use only. In others, I have taken the time to ensure the changes are suitable for upstreaming. In my experience, which I have not methodically recorded in any way, the LLM’s changes at this early stage have been pretty good. This is also partly because the changes I am making at the early stage are generally small, usually not requiring adding new functionality but simply hooking up existing functionality to a new input or output.

What’s most useful about the LLM in the early stages is not the actual code it writes, but its reasoning that helps me learn about the structure of the project. I don’t take the code blind, I am more interested in the reasoning than the code itself. I have found this to be reliably useful.

quantumHazer · 6h ago
no, they just claim that AI coding tools are magic and drink their kool-aid
Gormo · 4h ago
> I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

How does using AI impact the amount of time it takes you to become sufficiently familiar with the project to recognize when you are being led astray?

One of the worries I have with the fast ramp-up is that a lot of that ramp-up time isn't just grunt work to be optimized a way, it's active learning, and bypassing too much of it can leave you with an incomplete understanding of the problem domain that slows you down perpetually.

Sometimes, there are real efficiencies to be gained; other times those perceived efficiencies are actually incurring heavy technical debt, and I suspect that overuse of AI is usually the latter.

pragma_x · 5h ago
Not just new code-bases. I recently used an LLM to accelerate my learning of Rust.

Coming from other programming languages, I had a lot of questions that would be tough to nail down in a Google search, or combing through docs and/or tutorials. In retrospect, it's super fast at finding answers to things that _don't exist_ explicitly, or are implied through the lack of documentation, or exist at the intersection of wildly different resources:

- Can I get compile-time type information of Enum values?

- Can I specialize a generic function/type based on Enum values?

- How can I use macros to reflect on struct fields?

- Can I use an enum without its enclosing namespace, as I can in C++?

- Does rust have a 'with' clause?

- How do I avoid declaring timelines on my types?

- What is an idiomatic way to implement the Strategy pattern?

- What is an idiomatic way to return a closure from a function?

...and so on. This "conversation" happened here and there over the period of two weeks. Not only was ChatGPT up to the task, but it was able to suggest what technologies would get me close to the mark if Rust wasn't built to do what I had in mind. I'm now much more comfortable and competent in the language, but miles ahead of where I would have been without it.

davidclark · 5h ago
> That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

What is your accuracy on software development estimates? I always see these productivity claims matched again “It would’ve taken me” timelines.

But, it’s never examined if we’re good at estimating. I know I am not good at estimates.

It’s also never examined if the quality of the PR is the same as it would’ve been. Are you skipping steps and system understanding which let you go faster, but with a higher % chance of bugs? You can do that without AI and get the same speed up.

OptionOfT · 5h ago
Now the question is: did you gain the same knowledge and proficiency in the codebase that you would've gained organically?

I find that when working with an LLM the difference in knowledge is the same as learning a new language. Learning to understanding another language is easier than learning to speak another language.

It's like my knowledge of C++. I can read it, and I can make modifications of existing files. But writing something from scratch without a template? That's a lot harder.

nico · 5h ago
Some additional notes given the comments in the thread

* I wasn’t trying to be dismissive of the article or the study, just wanted to present a different context in which AI tools do help a lot

* It’s not just code. It also helps with a lot of tasks. For example, Claude Code figured out how to “manually” connect to the AWS cluster that hosted the source db, tested different commands via docker inside the project containers and overall helped immensely with discovery of the overall structure and infrastructure of the project

* My professional experience as a developer, has been that 80-90% of the time, results trump code quality. That’s just the projects and companies I’ve been personally involved with. Mostly saas products in which business goals are usually considered more important than the specifics of the tech stack used. This doesn’t mean that 80-90% of code is garbage, it just means that most of the time readability, maintainability and shipping are more important than DRY, clever solutions or optimizations

* I don’t know how helpful AI is or could be for things that require super clever algorithms or special data structures, or where code quality is incredibly important

* Having said that, the AI tools I’ve used can write pretty good quality code, as long as they are provided with good examples and references, and the developer is on top of properly managing the context

* Additionally, these tools are improving almost on a weekly or monthly basis. My experience with them has drastically changed even in the last 3 months

At the end of the day, AI is not magic, it’s a tool, and I as the developer, am still accountable for the code and results I’m expected to deliver

PaulDavisThe1st · 6h ago
TFA was specifically about people very familiar with the project and codebase that they are working on. Your anecdots is precisely the opposite of the situation is was about, and it acknowledged the sort of process you describe.
kevmo314 · 7h ago
You've missed the point of the article, which in fact agrees with your anecdote.

> It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work.

moogleii · 7h ago
That would be an aside, or a comment, not the point of the article.
antonvs · 7h ago
> You've missed the point of the article

Sadly clickbait headlines like the OP, "AI slows down open source developers," spread this misinformation, ensuring that a majority of people will have the same misapprehension.

raincole · 6h ago
Which is a good thing for people who are currently benefiting from AI, though. The slower other programmers adopt AI, the more edge those who are proficient with it have.

It took me an embarrassingly long time to realize a simple fact: using AI well is a shallow skill that everyone can learn in days or even hours if they want. And then my small advantage of knowing AI tools will disappear. Since the realization I've been always upvoting articles that claims AI makes you less productive (like the OP).

rightbyte · 4h ago
So you bother push some sort of self proclaimed false narrative with upvotes but then you try to counteract it by spelling it out?
samtp · 6h ago
Well that's exactly what it does well at the moment. Boilerplate starter templates, landing pages, throwaway apps, etc. But for projects that need precision like data pipelines, security - it code generated has many subtle flaws that can/will cause giant headaches in your project unless you dig through every line produced
quantumHazer · 6h ago
You clearly have not read the study. Problem is developers thought they were 20% faster, but they were actually slower. Anyway from a fast review about your profile you're in conflict of interest about vibe coding, so I will definitely take your opinion with a grain of salt.
floren · 6h ago
> Anyway from a fast review about your profile you're in conflict of interest about vibe coding

Seems to happen every time, doesn't it?

xoralkindi · 5h ago
How are you confident in the code, coding style and practices simply because the LLM says so. How do you know it is not hallucinating since you don't understand the codebase?

No comments yet

bko · 6h ago
When anecdote and data don't align, it's usually the data that's wrong.

Not always the case, but whenever I read about these strained studies or arguments about how AI is actually making people less productive, I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools. I wonder if the same thing happened with higher level programming languages where people argued, you may THINK not managing your own garbage collector will lead to more productivity but actually...

Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something. And I don't need a "study" to tell me that

adrian_b · 6h ago
TFA says clearly that it is likely that AI will make more productive anyone working on an unfamiliar code base, but make less productive those working on a project they understand well, and it gives reasonable arguments for why this is likely to happen.

Moreover, it acknowledges that for programmers working in most companies the first case is much more frequent.

bko · 6h ago
I have written every line of code in the code base I mostly work in and I still find it incredibly valuable. Millions use these tools and a large percentage of them find them useful in their familiar code base.

Again, overwhelming anecdote and millions of users > "study"

almatabata · 5h ago
> Interestingly the developers predict that AI will make them faster, and continue to believe that it did make them faster, even after completing the task slower than they otherwise would!

In this case clearly anecdotes are not enough. If that quote from the article is accurate, it shows that you cannot trust the developers time perception.

I agree, its only one study and we should not take it as the final answer. It definitely justifies doing a few follow up evaluations to see if this

overfeed · 2h ago
> If that quote from the article is accurate, it shows that you cannot trust the developers time perception.

The scientific method goes right out the window when it comes to true believers. It reminds me of weed-smokers who insist getting high makes them deep-thinkers: it feels that way in the moment, but if you've ever been a sober person caught up in a "deep" discussion among people high on THC, oh boy...

bko · 5h ago
Or I cannot trust a contrived laboratory setting with it's garden of forking paths.

https://mleverything.substack.com/p/garden-of-forking-paths-...

almatabata · 5h ago
I did not say to trust it. I do not need to trust it.

If I run my own tests on my own codebase I will definitely use some objective time measurement method and a subjective one. I really want to know if there is a big difference.

I really wonder if its just the individuals bias showing. If you are pro-AI you might overestimate one, and if you are against it you might under-estimate it.

bko · 4h ago
That's fair, I agree.
rsynnott · 6h ago
> I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools.

One of the more interesting findings of the study mentioned was that the LLM users, even where use of an LLM had apparently degraded their performance, tended to believe it had enhanced it. Anecdote is a _really_ bad argument against data that shows a _perception_ problem.

> Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something.

I mean, on that basis, so does homeopathy.

Like, it's just one study. It's not the last word. But "my anecdotes disprove it" probably isn't a _terribly_ helpful approach.

ted_bunny · 4h ago
Also, "anecdotes > data" as a general heuristic is a red flag. But like if clowns had a country and their flag were red. That kind.
markstos · 7h ago
I had a similar experience with AI and open source. AI allowed me to implement features in a language and stack I didn't know well. I had wanted these features for months and no one else was volunteering to implement them. I had tried to study the stack directly myself, but found the total picture to be complex and under-documented for people getting started.

Using Warp terminal (which used Claude) I was get past those barriers and achieve results that weren't happening at all before.

yomismoaqui · 7h ago
Someone on X said that these agentic AI tools (Claude Code, Amp, Gemini Cli) are to programming like the table saw was to hand-made woodworking.

It can make some things faster and better than a human with a saw, but you have to learn how to use them right (or you will loose some fingers).

I personally find that agentic AI tools make me be more ambitious in my projects, I can tackle some things I didn't tthougth about doing before. And I also delegate work that I don't like to them because they are going to do it better and quicker than me. So my mind is free to think on the real problems like architecture, the technical debt balance of my code...

Problem is that there is the temptation of letting the AI agent do everything and just commit the result without understanding YOUR code (yes, it was generated by an AI but if you sign the commit YOU are responsible for that code).

So as with any tool try to take the time to understand how to better use it and see if it works for you.

candiddevmike · 6h ago
> to programming like the table saw was to hand-made woodworking

This is a ridiculous comparison because the table saw is a precision tool (compared to manual woodworking) when agentic AI is anything but IMO.

marcellus23 · 5h ago
The nature of the comparison is in the second paragraph. It's nothing to do with how precise it is.
bgwalter · 7h ago
"You are using it wrong!"

This is insulting to all pre-2023 open source developers, who produced the entire stack that the "AI" robber barons use in their companies.

It is even more insulting because no actual software of value has been demonstrably produced using "AI".

yomismoaqui · 6h ago
> It is even more insulting because no actual software of value has been demonstrably produced using "AI".

Claude Code and Amp (equivalent from Sourcegraph) are created by humans using these same tools to add new features and fix bugs.

Having used both tools for some weeks I can tell you that they provide a great value to me, enough that I see paying $100 monthly as a bargain related to that value.

Edit: typo

jdiff · 5h ago
GP is pointing out the distinct lack of AI driven development in the wild. At this point, agents should be visibly maintaining at least a few popular codebases across this world wide web. The fact that there aren't raises some eyebrows for the claims that are regularly made by proponents. Not just the breathless proponents, either. Even taking claims very conservatively, FOSS maintainer burnout should be a thing of the past, but the only noted interaction with AI seems to be amplifying it.
yomismoaqui · 4h ago
It's disingenuous to expect that tools that are publicly available for less than a year have a massive adoption in the wild.

Think that these were internal tools that provided value to engineers on Anthropic, OpenAI, Google & others and now are starting to be adopted by the general public.

Some people are overhyped and some seem hurt because I don't know, maybe they define themselves by their ability to write code by hand.

I have no horse in this race and I can only tell you about my experience and I can tell you that the change is coming.

Also if you don't trust a random HN nickname go read about the experiences of people like Armin Ronacher (Flask creator), Steve Yegge or Thomas H. Ptacek.

- https://lucumr.pocoo.org/2025/6/4/changes/ - https://sourcegraph.com/blog/the-brute-squad - https://fly.io/blog/youre-all-nuts/

jdiff · 7m ago
I'm not asking for massive adoption. I'm asking for public facing evidence of what many claim privately, that they have evolved their job into managing agents and reviewing vs writing anything themselves.

Again, not massive adoption, just one codebase that's used in production with this property. If it's such a productivity boost, there has to be at least one public facing project that's done the same as random HN nicknames and nonrandom named individuals.

asadotzler · 3h ago
>It's disingenuous to expect that tools that are publicly available for less than a year have a massive adoption in the wild.

Github got massive adoption in a year, probably 100K developers and tens of thousands of projects including big names like Ruby on Rails.

I'm sure if I spent more than 2 minutes on this I'd have even more examples but this one is enough to neuter your claims.

tomasz_fm · 7h ago
Only one developer in this study had more than 50h of Cursor experience, including time spent using Cursor during the study. That one developer saw a 25% speed improvement.

Everyone else was an absolute Cursor beginner with barely any Cursor experience. I don't find it surprising that using tools they're unfamiliar with slows software engineers down.

I don't think this study can be used to reach any sort of conclusion on use of AI and development speed.

narush · 7h ago
Hey, thanks for digging into the details here! Copying a relevant comment (https://news.ycombinator.com/item?id=44523638) from the other thread on the paper, in case it's help on this point.

1. Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.

2. Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.

3. Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!

4. We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.

5. As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.

In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).

I'll also note that one really important takeaway -- that developer self-reports after using AI are overoptimistic to the point of being on the wrong side of speedup/slowdown -- isn't a function of which tool they use. The need for robust, on-the-ground measurements to accurately judge productivity gains is a key takeaway here for me!

(You can see a lot more detail in section C.2.7 of the paper ("Below-average use of AI tools") -- where we explore the points here in more detail.)

brulard · 1h ago
1. That does not support these results in any way 2. Having experience prompting is quite a little part of being able to use agentic IDE tools. It's like relating cutting onion to being a good cook

I think we should all focus on how the effectivity is going to change in the long-term. We all know AI tooling is not going to disappear but to become better and better. I wouldn't be afraid to lose some productivity for months if I would acquire new skills for the future.

WhyNotHugo · 4h ago
An interesting little detail. Any seasoned developer is likely going to take substantially longer if they have to use any IDE except their everyday one.

I've been using Vim/Neovim for over a decade. I'm sure if I wanted to use something like Cursor, it would take me at least a month before I can productive even a fraction of my usual.

Art9681 · 7h ago
This is exactly my same take. Any tool an engineer is inexperienced with will slow them down. AI is no different.
bluefirebrand · 7h ago
This runs counter to the starry eyed promises of AI letting people with no experience accomplish things
TeMPOraL · 6h ago
That promise is true, though, and the two claims are not opposite. The devil is in details, specifically in what you mean by "people" and "accomplish things".

If by "people" you mean "general public", and by "accomplish things" you mean solving some immediate problems, that may or may not involve authoring a script or even a small app - then yes, this is already happening, and is a big reason behind the AI hype as it is.

If by "people" you mean "experienced software engineers", and by "accomplish things" you mean meaningful contributions to a large software product, measured by high internal code and process quality standards, then no - AI tools may not help with that directly, though chances are greater when you have enough experience with those tools to reliably give them right context and steer away from failure modes.

Still, solving one-off problems != incremental improvements to a large system.

bluefirebrand · 6h ago
> If by "people" you mean "experienced software engineers",

My post is a single sentence and I literally wrote "people with no experience"

helloplanets · 5h ago
He addressed your point in the paragraph before that. The paragraph from which you quoted was meant to show the difference between your point and the fact that the original research was indeed measuring software engineers.
bluefirebrand · 5h ago
My point is that I was very clear about what people I was referring to.

No need for all the "if by people you mean" rigamarole

ben_w · 5h ago
Then your previous point is false, because "X helps Y" doesn't run counter to any promise that "X helps Z".

You said the second. You responded to the first.

Y = [experts]

Z = [noobs]

{Y, Z} ⊆ [all humans]

jonfw · 5h ago
AI let's people with no experience accomplish things. People who have experience can create those things without AI. Those experienced folks will likely outperform novices, even when novices leverage AI.

None of these statements are controversial. What we have to establish is- Does the experienced AI builder outperform the experienced manual coder?

antimora · 7h ago
I'm one of the regular code reviewers for Burn (a deep learning framework in Rust). I recently had to close a PR because the submitter's bug fix was clearly written entirely by an AI agent. The "fix" simply muted an error instead of addressing the root cause. This is exactly what AI tends to do when it can't identify the actual problem. The code was unnecessarily verbose and even included tests for muting the error. Based on the person's profile, I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.
dawnerd · 6h ago
That's what I love about LLMs. You can spot it doesn't know the answer, tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

It scares me how much code is being produced by people without enough experience to spot issues or people that just gave up caring. We're going to be in for wild ride when all the exploits start flowing.

cogman10 · 6h ago
My favorite LLM moment. I wrote some code, asked the LLM "Find any bugs or problems with this code" and of course what it did was hyperfocus on an out of date comment (that I didn't write). Since the problem no longer existed identified in the comment, the LLM just spat out like 100 lines of garbage to refactor the code.
rectang · 5h ago
> "You're absolutely right."

I admit a tendency to anthropomorphize the LLM and get irritated by this quirk of language, although it's not bad enough to prevent me from leveraging the LLM to its fullest.

The key when acknowledging fault is to show your sincerity through actual effort. For technical problems, that means demonstrating that you have worked to analyze the issue, take corrective action, and verify the solution.

But of course current LLMs are weak at understanding, so they can't pull that off. I wish that the LLM could say, "I don't know", but apparently the current tech can't know that that it doesn't know.

And so, as the LLM flails over and over, it shamelessly kisses ass and bullshits you about the work its doing.

I figure that this quirk of LLMs will be minimized in the near future by tweaking the language to be slightly less obsequious. Improved modeling and acknowledging uncertainty will be a heavier lift.

daxfohl · 5h ago
It'd be nice if github had a feature that updated the issue with this context automatically too, so that if this agent gives up and closes the PR, the next agent doesn't go and do the exact same thing.
candiddevmike · 6h ago
> tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

...and then it still doesn't actually fix it

mlyle · 6h ago
So, I recently have done my first couple heavily AI augmented tasks for hobby projects.

I wrote a TON of LVGL code. The result wasn’t perfect for placement, but when I iterated a couple of times, it fixed almost all of the issues. The result is a little hacked together but a bit better than my typical first pass writing UI code. I think this saved me a factor of 10 in time. Next I am going to see how much of the cleanup and factoring of the pile of code it can do.

Next I had it write a bunch of low level code to init hardware. It saved me a little time compared to reading the reference manual, and was more pleasant, but it wasn’t perfectly correct. If I did not have domain expertise I would not have been able to complete the task with the LLM.

la_fayette · 5h ago
When you argued that it saved you time by a factor of 10, have you even measured that properly? I initially also had the feeling that LLMs save me time, but in the end it didn't. I roughly compared my performance to past performance by the amount of stories done and LLMs made me slower even if I thought I am saving time...

From several month of deep work with LLMs I think they are amazing pattern matchers, but not problem solvers. They suggest a solution pattern based on their trained weights. This even could result in real solutions, e.g., when programming Tetris or so, but not when working on somewhat unique problems...

mlyle · 5h ago
I am pretty confident. Last similar LVGL thing I did took me 10-12 hours, and I had a quicker iteration time (running locally instead of the test hardware). Here I spent a little more than an hour, testing on real hardware, and the last 20 minutes was nitpicking.

Writing front-end display code and instantiating components to look right is very much playing to the model’s strength, though. A carefully written sentence plus context would become 40 lines of detail-dense but formulaic code.

(I have also had a lot of luck asking it to make a first pass at typesetting things in Tex, too, for similar reasons)

delusional · 4h ago
There was a recent study that found that LLM users in general tend to feel like they were more productive with AI while actually being less productive.
asadotzler · 3h ago
presumably the study this very HN discussion responds to.
delusional · 1h ago
Heh, yep. Guess I sometimes forget to read the content before commenting too.
stavros · 5h ago
> If I did not have domain expertise I would not have been able to complete the task with the LLM.

This kind of sums up my experience with LLMs too. They save me a lot of time reading documentation, but I need to review a lot of what they write, or it will just become too brittle and verbose.

Retr0id · 5h ago
I was trying out Copilot recently for something trivial. It made the change as requested, but also added a comment that stated something obvious.

I asked it to remove the comment, which it enthusiastically agreed to, and then... didn't. I couldn't tell if it was the LLM being dense or just a bug in Copilot's implementation.

seunosewa · 6h ago
Some prompts can help:

"Find the root cause of this problem and explain it"

"Explain why the previous fix didn't work."

Often, it's best to undo the action and provide more context/tips.

Often, switching to Gemini 2.5 Pro when Claude is stumped helps a lot.

brazzy · 5h ago
My favourite recent experience was switching multiple times between using a library function and rolling its own implementation, each time claiming that it's "simplifying" the code and making it "more reliable".
colechristensen · 6h ago
Sometimes it does... sometimes.

I recently had a nice conversation looking for some reading suggestions from an LLM. The first round of suggestions were superb, some of them I'd already read, some were entirely new and turned out great. Maybe a dozen or so great suggestions. Then it was like squeezing blood from a stone but I did get a few more. After that it was like talking to a babbling idiot. Repeating the same suggestions over and over, failing to listen to instructions, and generally just being useless.

LLMs are great on the first pass but the further you get away from that they degrade into uselessness.

aquariusDue · 6h ago
Yeah, when I first heard about "one-shot"ing it felt more like a trick instead of a useful heuristic but with time my experience mimics yours, nowadays I try to one-shot small-ish changes instead of going back and forth.
daxfohl · 5h ago
I've had some luck in these cases prompting "your context seems to be getting too bloated. summarize this conversation into a prompt that I can feed into a new chat with a fresh context. make sure to include <...>".

Sometimes it works well the first time, and sometimes it spits out a summary where you can see what it is confused about, and you can guide it to create a better summary. Sometimes just having that summary in its context gets it over the hump and you can just say "actually I'm going to continue with you; please reference this summary going forward", and sometimes you actually do have to restart the LLM with the new context. And of course sometimes there's nothing that works at all.

dawnerd · 5h ago
I’ve had really good luck with having gpt generate a todo list that’s very, very detailed. Then having Claude use it to check items off. Still far from perfect but since doing that haven’t run into context issues since I can just start a new chat and feed it the todo (the todo also contains project info).
colechristensen · 6h ago
I also get things like this from very experienced engineers working outside their area of expertise. It's obviously less of the completely boneheaded suggestion but still doing exactly the wrong thing suggested by AI that required a person to step in and correct.
Macha · 5h ago
I recently reviewed a MR from a coworker. There was a test that was clearly written by AI, except I guess however he prompted it, it gave some rather poor variable names like "thing1", "thing2", etc. in test cases. Basically, these were multiple permutations of data that all needed to be represented in the result set. So I asked for them to be named distinctively, maybe by what makes them special.

It's clear he just took that feedback and asked the AI to make the change, and it came up with a change that gave them all very long, very unique names, that just listed all the unique properties in the test case. But to the extent that they sort of became noise.

It's clear writing the PR was very fast for that developer, I'm sure they felt they were X times faster than writing it themselves. But this isn't a good outcome for the tool either. And I'm sure if they'd reviewed it to the extent I did, a lot of that gained time would have dissipated.

meindnoch · 7h ago
>a deep learning framework in Rust [...] This is becoming a troubling trend with AI tools.

The serpent is devouring its own tail.

TeMPOraL · 7h ago
OTOH when they'll start getting good AI contributions, then... it'll be too late for us all.
LoganDark · 7h ago
Deep learning can be incredibly cool and not just used for AI slop.
jampa · 6h ago
> I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

It has been for a while, AI just makes SPAM more effective:

https://news.ycombinator.com/item?id=24643894

pennomi · 6h ago
This is the most frustrating thing LLMs do. They put wide try:catch structures around the code making it impossible to actually track down the source of a problem. I want my code to fail fast and HARD during development so I can solve every problem immediately.
daxfohl · 5h ago
Seems like there's a need for github to create a separate flow for AI-cretaed PRs. Project maintainers should be able to stipulate rules like this in English, and an AI "pre-reviewer" would check that the AI has followed all these rules before the PR is created, and chat with the AI submitter to resolve any violations. For exceptional cases, a human submitter is required.

Granted, the compute required is probably more expensive than github would offer for free, and IDK whether it'd be within budget for many open-source projects.

Also granted, something like this may be useful for human-sourced PRs as well, though perhaps post-submission so that maintainers can see and provide some manual assistance if desired. (And also granted, in some cases maybe maintainers would want to provide manual assistance to AI submissions, but I expect the initial triaging based on whether it's a human or AI would be what makes sense in most cases).

kfajdsl · 3h ago
This is my number one complaint with LLM produced code too. The worst thing is when it swallows an error to print its own error message with far less info and no traceback.

In my rules I tell it that try catches are completely banned unless I explicitly ask for one (an okay tradeoff, since usually my error boundaries are pretty wide and I know where I want them). I know the context length is getting too long when it starts ignore that.

0xbadcafebee · 5h ago
> The "fix" simply muted an error instead of addressing the root cause.

FWIW, I have seen human developers do this countless times. In fact there are many people in engineering that will argue for these kinds of "fixes" by default. Usually it's in closed-source projects where the shittiness is hidden from the world, but trust me, it's common.

> I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

There was already a problem (pre-AI) with shitty PRs on GitHub made to try to game a system. Regardless of how they made the change, the underlying problem is a policy one: how to deal with people making shitty changes for ulterior motives. I expect the solution is actually more AI to detect shitty changes from suspicious submitters.

Another solution (that I know nobody's going to go for): stop using GitHub. Back in the "olden times", we just had CVS, mailing lists and patches. You had to perform some effort in order to get to the point of getting the change done and merged, and it was not necessarily obvious afterward that you had contributed. This would probably stop 99% of people who are hoping for a quick change to boost their profile.

nerdjon · 5h ago
I will never forget being in a code review for a upcoming release, there was a method that was... different. Like massively different with no good reason why it was changed as much as it was for such a small addition.

We asked the person why they made the change, and "silence". They had no reason. It became painfully clear that all they did was copy and paste the method into an LLM and say "add this thing" and it spit out a completely redone method.

So now we had a change that no one in the company actually knew just because the developer took a shortcut. (this change was rejected and reverted).

The scariest thing to me is no one actually knowing what code is running anymore with these models having a tendency to make change for the sake of making change (and likely not actually addressing the root thing but a shortcut like you mentioned)

tomrod · 6h ago
As a side question: I work in AI, but mostly python and theory work. How can I best jump into Burn? Rust has been intriguing to me for a long time
lvl155 · 6h ago
This is a real problem that’s only going to get worse. With the major model providers basically keeping all the data themselves, I frankly don’t like this trend long term.
doug_durham · 6h ago
You should be rejecting the PR because the fix was insufficient, not because it was AI agent written. Bad code is bad code regardless of the source. I think the fixation on how the code was generated is not productive.
glitchc · 5h ago
No, that's not how code review works. Getting inside the mind of the developer, understanding how they thought about the fix, is critical to the review process.

If an actual developer wrote this code and submitted it willingly, it would either constitute malice, an attempt to sabotage the codebase or inject a trojan, or stupidity, for failing to understand the purpose of the error message. With an LLM we mostly have stupidity. Flagging it as such reveals the source of the stupidity, as LLMs do not actually understand anything.

RobinL · 6h ago
The problem is that code often takes as long to review as to write, and AI potentially reduces the quality bar to pull requests. So maintainers have a problem of lots of low quality PRs that take time to reject
rustyminnow · 5h ago
> You should be rejecting the PR because the fix was insufficient

I mean they probly could've articulated it your way, but I think that's basically what they did... they point out the insufficient "fix" later, but the root cause of the "fix" was blind trust in AI output, so that's the part of the story they lead with.

andix · 6h ago
What I noticed: AI development constantly breaks my flow. It makes me more tired, and I work for shorter time periods on coding.

It's a myth that you can code a whole day long. I usually do intervals of 1-3 hours for coding, with some breaks in between. Procrastination can even happen on work related things, like reading other project members code/changes for an hour. It has a benefit to some extent, but during this time I don't get my work done.

Agentic AI works the best for me. Small refactoring tasks on a selected code snippet can be helpful, but isn't a huge time saver. The worst are AI code completions (first version Copilot style), they are much more noise then help.

rightbyte · 3h ago
It would be interesting to record what one do in a day at the desk. Probably quite depressing to watch.

Like, I think 1h would be streaching it for mature codebases.

andix · 3h ago
The 1h I'm talking about is not all the time I might spend reading on code. It's the time I might procrastinate on my tasks with reading unrelated code.

Like doom scrolling on social media: Let's see what the fancy new guy got done this week. I need to feel better, I'm just going to look at the commits of the guy in the other team that always breaks production. Let's see how close he got to that recently, ...

lsy · 6h ago
Typically debugging, e.g., a tricky race condition in an unfamiliar code base would require adding logging, refactoring library calls, inspecting existing logs, and even rewriting parts of your program to be more modular or understandable. This is part of the theory-building.

When you have an AI that says "here is the race condition and here is the code change to make to fix it", that might be "faster" in the immediate sense, but it means you aren't understanding the program better or making it easier for anyone else to understand. There is also the question of whether this process is sustainable: does an AI-edited program eventually fall so far outside what is "normal" for a program that the AI becomes unable to model correct responses?

sodapopcan · 6h ago
This is always my thought whenever I hear the "AI let me build a feature in a codebase I didn't know in a language I didn't know" (which is often, there is at one in these comments). Great, but what have you learned? This is fine for small contributions, I guess, but I don't hear a lot of stories of long-term maintenance. Unpopular opinion, though, I know.
threetonesun · 6h ago
I guess it's a question of how anyone learns. There's some value in typing code, I suppose, but with tab complete that's been gone for a long time. Letting AI write something and then reading it seems as good as copying and pasting from some other source.
sodapopcan · 6h ago
I'm not super qualified to answer as I haven't gone deep into AI at all. But from my limited observations I'd say yes and no. You generally aren't copy/pasting entire features, just snippets that you yourself have to string together in a sensible way. Of course there are lots of people who still do this and what's why I find most people in this industry infuriating to work with. It's all good when it's boilerplate, and that's actually my primary use of "AI"—it's essentially been a snippets replacement (and is quite good at that).
trey-jones · 4h ago
Doing my own post-mortem of a recent project (the first that I've leaned on "AI" tools to any extent), my feeling was the following:

1. It did not make me faster. I don't know that I expected it to.

2. It's very possible that it made me slower.

3. The quality of my work was better.

Slower and better are related here, because I used these tools more to either check ideas that I had for soundness, or to get some fresh ideas if I didn't have a good one. In many cases the workflow would be: "I don't like that idea, what else do you have for me?"

There were also instances of being led by my tools into a rabbit hole that I eventually just abandoned, so that also contributes to the slowness. This might happen in instances where I'm using "AI" to help cover areas that I'm less of an expert in (and these were great learning experiences). In my areas of expertise, it was much more likely that I would refine my ideas, or the "AI" tool's ideas into something that I was ultimately very pleased with, hence the improved quality.

Now, some people might think that speed is the only metric that matters, and certainly it's harder to quantify quality - but it definitely felt worth it to me.

jpc0 · 4h ago
I do this a lot and absolutely think it might even improve it, and this is why I like the current crop of AIs that are more likely to be argumentative and not just capitulate.

I will ask the AI for an idea and then start blowing holes in its idea, or will ask it to do the same for my idea.

And I might end up not going with it’s idea regardless but it got me thinking about things I wouldn’t have thought about.

Effectively its like chatting to a coworker that has a reasonable idea about the domain and can bounce ideas around.

trey-jones · 4h ago
I'm on record saying it's "like the smartest coworker I've ever had" (no offense).
piker · 7h ago
My main two attempts at using an “agentic” coding workflow were trying to incorporate an Outlook COM interface into my rust code base and to streamline an existing abstract windows API interaction to avoid copying memory a couple of times. Both wasted tremendous amounts of time and were ultimately abandoned leaving me only slightly more educated about windows development. They make great autocompletion engines but I just cannot see them being useful in my project otherwise.
jdiff · 6h ago
They make great autocompletion engines, most of the time. It's nice when it can recognize that I'm replicating a specific math formula and expands out the next dozen lines for me. It's less nice when it predicts code that's not even syntactically valid for the language or the correct API for the library I'm using. Those times, for whatever reason, seem to be popping up a lot in the last few weeks so I find myself disabling those suggestions more often than not.
crinkly · 7h ago
This is typically what I see when I’ve seen it applied. And as always trying to hammer nails in with a banana.

Rather than fit two generally disparate things together it’s probably better to just use VSTO and C# (hammer and nails) rather than some unholy combination no one else has tried or suffered through. When it goes wrong there’s more info to get you unstuck.

piker · 4h ago
To be fair though, unsafe rust (where the COM lives) is basically just C, so I totally expected it to be tractable in the same way it has been tractable for the last 20ish years? But it isn’t.

Why is interacting with the OS’ API in a compiled language the wrong approach in 2025? Why must I use this managed Frankenstein’s monster of dotnet? I didn’t want to ship or expect a whole runtime for what should be a tiny convenience DLL. Insane

charcircuit · 6h ago
I had the opposite experience. Gemini was able to work with COM and accomplish what I needed despite me never using COM before.
tonyedgecombe · 5h ago
I've done a lot of work with COM over the years and that is the last technology I would trust to an AI. It's very easy to write COM code that appears to work but contains subtle bugs.
piker · 4h ago
That was my issue. Integration works, Outlook itself not so much, afterwards. (I.e. memory error.)
piker · 4h ago
Actually hadn’t tried Gemini with it yet. Perhaps worth taking a look.
doc_manhat · 7h ago
I directionally disagree with this:

``` It's common for engineers to end up working on projects which they don't have an accurate mental model of. Projects built by people who have long since left the company for pastures new. It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work. ```

Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.

Having said that it also depends on how important it is to be writing bug free code in the given domain I guess.

I like AI particularly for green field stuff and one off scripts as it let's you go faster here. Basically you build up the mental model as you're coding with the AI.

Not sure about whether this breaks down at a certain codebase size though.

horsawlarway · 7h ago
Just anecdotally - I think your reason for disagreeing is a valid statement, but not a valid counterpoint to the argument being made.

So

> Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.

This is completely correct. It's a very fair statement. The problem is that a developer coming into a large legacy project is in this spot regardless of the existence of AI.

I've found that asking AI tools to generate a changeset in this case is actually a pretty solid way of starting to learn the mental model.

I want to see where it tries to make changes, what files it wants to touch, what libraries and patterns it uses, etc.

It's a poor man's proxy for having a subject matter expert in the code give you pointers. But it doesn't take anyone else's time, and as long as you're not just trying to dump output into a PR can actually be a pretty good resource.

The key is not letting it dump out a lot of code, in favor of directional signaling.

ex: Prompts like "Which files should I edit to implement a feature which does [detailed description of feature]?" Or "Where is [specific functionality] implemented in this codebase?" Have been real timesavers for me.

The actual code generation has probably been a net time loss.

Roscius · 7h ago
> I've found that asking AI tools to generate a changeset in this case is actually a pretty solid way of starting to learn the mental model.

This. Leveraging the AI to start to develop the mental model is an advantage. But, using the AI is a non-trivial skill set that needs to be learned. Skepticism of what it's saying is important. AI can be really useful just like a 747 can be useful, but you don't want someone picked off the street at random flying it.

bluefirebrand · 6h ago
> This. Leveraging the AI to start to develop the mental model is an advantage

Is there any evidence that AI helps you build the mental model of an unfamiliar codebase more quickly?

In my experience trying to use AI for this it often leads me into the weeds

doc_manhat · 7h ago
Yeah fair points particularly for larger codebases I could see this being a huge time saver.
uludag · 7h ago
Great article and I was having very similar thoughts with regards to this productivity study and the "Programming as Theory Building" paper. I'm starting to be convinced that if you are the original author of a program and still have the program's context in the head, you are the asymptote to which any and all AI systems will approach but never surpass: maybe not in terms of raw coding speed, but in terms of understanding the program, its vision of development, its deficiencies and hacks, its context, its users and what they want, the broader culture the program exists in, etc.

I really like how the author then brought up the point that for most daily work we don't have the theory built, even a small fraction of it, and that this may or may not change the equation.

conartist6 · 5h ago
Thanks, <3
neuroelectron · 7h ago
Good article and it makes sense. I wish I had sometime in my career worked on a codebase that was possible to be understood without 10 years of experience. Instead most of my development time was spent tracing execution paths through tangles of abstractions in nested objects in 10M LOC legacy codebases. My buddy who introduced me to the job is still doing it today and now uses AI and this has given him the free time to start working on his own side projects. So there's certain types if jobs where AI will certainly speed up your development.
stevekrouse · 3h ago
Such a great essay! Peter Naur's thesis is also the central point in my talk about vibe coding from last month: https://www.youtube.com/watch?v=1WC8dxMC4Xw

I'm spending an inordinate amount of time turning that video into an essay, but I feel like I'm being scooped already, so here's my current draft in case anyone wants to get a sneak preview: https://valdottown--89ed76076a6544019f981f7d4397d736.web.val...

Feedback appreciated :)

omnicognate · 7h ago
All these studies that show "AI makes developers x% more/less productive" are predicated on the idea that developer "productivity" can be usefully captured in a single objectively measurable number.

Just one problem with that...

narush · 7h ago
Thanks for the feedback! I strongly agree this is not the only measure of developer productivity -- but it's certainly one of them. I think this measure as speaks very directly to how _many_ developers (myself included) understand the impact of AI tools on their own work currently (e.g. just speeding up implementation speed).

(The SPACE [1] framework is a pretty overview of considerations here; I agree with a lot of it, although I'll note that METR [2] has different motivations for studying developer productivity than Microsoft does.)

[1] https://dl.acm.org/doi/10.1145/3454122.3454124

[2] https://metr.org/about

charcircuit · 6h ago
As long as the true productivity is correlated with that number it should be fine.
joshmarlow · 7h ago
I've gotten some pretty cool things working with LLMs doing most of the heavy lifting using the following approaches:

* spec out project goals and relevant context in a README and spec out all components; have the AI build out each component and compose them. I understand the high-level but don't necessarily know all of the low-level details. This is particularly helpful when I'm not deeply familiar with some of the underlying technologies/libraries. * having an AI write tests for code that I've verified is working. As we all know, testing is tedious - so of course I want to automate it. And we written tests (for well written code) can be pretty easy to review.

remorses · 3h ago
Using AI agents productively requires setting up a repository for collaboration, it means writing docs and making the build process easy and fast.

As any other tool AI is slow to adopt but has huge gains later on

Kim_Bruning · 6h ago
I think different people use these tools differently. I've got mine set up to start in "rubber duck" mode, where I do rubber duck programming, before asking the AI to help me with certain tasks (if at all). Low impact utility scripts? The AI gets let off the leash. Critical core logic? I might do most of the work myself (though having a rubber duck can still be good!)
i_love_retros · 5h ago
The AI hype will die off just like block chain and web3. LLMs are a solution in search of a problem.

All the VCs are gonna lose a ton of money! OpenAI will be NopenAI, relegated to the dustbin of history.

We never asked for this, nobody wants it.

Companies using AI and promoting it in their products will be seen as tacky and cheap. Just like developers and artists that use it.

ringeryless · 6h ago
not to mention the annoyance of AI assisted issues being opened, many times incorrectly due to hallucinations. these tickets hammer human teams with nonsense and suck resources away from real issues.
wellpast · 6h ago
The fact that the devs thought the AI saved them time is no surprise to me… at least at this point in my career.

Developers (people?) in general for some reason just simply cannot see time. It’s why so many people don’t believe in estimation.

What I don’t understand is why. Is this like a general human brain limitation (like not being able to visualize four dimensions, or how some folks don’t have an internal monologue)?

Or is this more psychodynamic or emotional?

It’s been super clear and interesting to me how developers I work with want to believe AI (code generation) is saving them time when it’s clearly obviously not.

Is it just the hope that one day it will? Is it fetishization of AI?

Why in an industry that so requires clarity of thinking and expression (computer processors don’t like ambiguity), can we be so bad at talking about, thinking about… time?

Don’t get me started on the static type enthusiasts who think their strong type system (another seeming fetish) is saving them time.

diamond559 · 5h ago
Measure twice cut once. Not cut 100 times and hope it does it right once.
hartator · 6h ago
I am not super sure how to quickly writing benchmark scripts that are one-shot used slows anyone down, but okay.
xyst · 7h ago
Not surprising. Use of LLM has only been helpful in initial exploration of unknown code bases or languages for me.

Using it beyond that is just more work. First parse the broken response, remove any useless junk, have it reprocess with updated query.

It’s a nice tool to have (just as search engines gave us easy access to multiple sources/forums), but its limitations are well known. Trying to use it 100% as intended is a massive waste of time and resources (energy use…)

afro88 · 6h ago
I said this when the linked paper was shared and got downvotes: it's based on early 2025 data. My point isn't that it should be completely up to date, but that how we need to consider it in that context. This is pre Claude 4, Claude Code. Pre Gemini 2.5 even. These models are such a big step up from what came previously.

Just like we put a (2023) on articles here so they are considered in the right context, so too this paper should be. Blanket "AI tools slow sown development" statements with a "look this rigorous paper says so!" is ignoring a key variable: the rate of effectiveness improvement. If said paper evaluated with the current models, the picture would be different. Also in 3 months time. AI tools aren't a static thing that either works or don't indefinitely.

tonyedgecombe · 5h ago
>This is pre Claude 4, Claude Code. Pre Gemini 2.5 even.

The most interesting point from the article wasn't about how well the AI's worked, rather it was the gap between peoples perception and their actual results.

methuselah_in · 7h ago
Those of current generation students who have access to ai might become slow over time. Because when things are not readily available then they have to struggle and work harder in that process, at that time I thing human a lot of secondary things ! Now when everything is easily available especially knowledge without knowing how to struggle with basics. It will eventually make kids dumb. But can be opposite also. Eventually even I become slow even I keep on using chat gpt or gemini.
gjsman-1000 · 7h ago
What I thought was fascinating, and should be a warning sign to everyone here:

Before beginning the study, the average developer expected about a 20% productivity boost.

After ending the study, the average developer (potentially: you) believed they actually were 20% more productive.

In reality, they were 0% more productive at best, and 40% less productive at worst.

Think about what it would be like to be that developer; off by 60% about your own output.

If you can't even gauge your own output without being 40% off on average, 60% off at worst; be cautious about strong opinions on anything in life. Especially politically.

Edit 1: Also consider, quite terrifyingly, if said developers were in an online group, together, like... here. The one developer who said she thought it made everyone slower (the truth in this particular case), would be unanimously considered an idiot, downvoted to the full -4, even with the benefit of hindsight.

Edit 2: I suppose this goes to show, that even on Hacker News, where there are relatively high-IQ and self-aware individuals present... 95% of the crowd can still possibly be wildly delusional. Stick to your gut, regardless of the crowd, and regardless of who is in it.

bluefirebrand · 6h ago
> Also consider, quite terrifyingly, if said developers were in an online group, together, like... here. The one developer who said she thought it made everyone slower (the truth in this particular case), would be unanimously considered an idiot, downvoted to the full -4, even with the benefit of hindsight

Yeah, this is me at my job right now. Every time I express even the mildest skepticism about the value of our Cursor subscription, I'm getting follow up conversations basically telling me to shut up about it

It's been very demoralizing. You're not allowed to question the Emperor's new clothes

quantumHazer · 6h ago
This should really be top comment. The problem is this tools can really give us some value in certain type of areas, but they are not like they are marketed.
pphysch · 7h ago
Given how deadlines/timelines tend to (not) work in SWE, this is not surprising.
gjsman-1000 · 7h ago
Perhaps; but this is a developer's own output with an AI tool, compared against their own historical output when they didn't use it. Apparently, the average developer (read: quite possibly most people here) can't even hit the broadside of a barn in estimating their own productivity.
dragonwriter · 7h ago
That this is generally a problem, and was established as such before software development existed (the big thing people usually point to is a RAND corp from the 1940s) and is the whole motivation for Wideband Delphi estimation methods invented shortly afterwards (of which agile "planning poker" is simply a particular more recent realization) for forward estimation, and why lean methods center on using a plan-do-check-act cycle for process improvements rather than seat of the pants and subjective feel.

But despite the popularity of some of this (planning poker, particularly; PDCA for process improvements is sadly less popular) as ritual, those elements have become part of a cargo cult where almost no one remembers why we do it.

freedomben · 7h ago
But this is still regarding forward estimating of future work, whereas GP is talking about gauging actual, past work done. The problems with forward estimation are indeed widely known, but I doubt most people realize that they are so bad at even knowing how productive they were.
sureglymop · 7h ago
That doesn't surprise me at all. Isn't software engineering in essence about being constantly confronted with new problems to solve and having to come up with a sufficient one on the fly? It seems very hard to estimate this, even if you know yourself well.
lupire · 7h ago
They were 20% underestimating how long it took them to do a 1-8 hr task that they had just completed.

It's like Tog's study that people think Keyboard is faster than the mouse even when they are faster with the mouse. Because they are measuring how they feel, not what is actually happening.

https://www.asktog.com/TOI/toi06KeyboardVMouse1.html

marcosdumay · 6h ago
That is a very weird set of findings.

This one in particular:

> It takes two seconds to decide upon which special-function key to press.

seems to indicate the study was done on people with no familiarity at all with the software they were testing.

Either way, I don't think there is any evidence out there supporting that either of keyboard-only or mouse-only is faster or equivalent to keyboard+mouse for well known GUIs.

alganet · 6h ago
This idea that some developers have some "mental model" and others not is an extraordinary claim, and I don't see extraordinary evidence.

It sounds like a good thing, right? "Wow, mental model. I want that, I want to be good and have big brain", which encourages you to believe the bullshit.

The truth is, this paper is irrelevant and a waste of time. It only serves the purpose of creating discussion around the subject. It's not science, it's a cupholder for marketing.

imiric · 5h ago
You couldn't be more wrong. If you've ever programmed, or worked with programmers, that is not an extraordinary claim at all, but a widely accepted fact.

A mental model of the software is what allows a programmer to intuitively know why the software is behaving a certain way, or what the most optimal design for a feature would be. In the vast majority of cases these intuitions are correct, and other programmers should pay attention to them. This ability is what separates those with a mental model and those without.

On the other hand, LLMs are unable to do this, and are usually not used in ways that help build a mental model. At best, they can summarize the design of a system or answer questions about its behavior, which can be helpful, but a mental model is an abstract model of the software, not a textual summary of its design or behavior. Those neural pathways can only be activated by natural learning and manual programming.

alganet · 5h ago
> You couldn't be more wrong.

Explanation missing.

> If you've ever programmed, or worked with programmers, that is not an extraordinary claim at all.

One step ahead of you. I already say this is engineered to encourage belief "I want to be good, big brain, and open source is good, I want to be good big brain".

It's marketing.

> A mental model of the software is what allows a programmer [yadda yadda]

I'm not saying it doesn't exist, I'm saying the paper doesn't provide any relevant information regarding the phenomena.

> Those neural pathways can only be activated by natural learning and manual programming.

Again, probably true. But the paper doesn't provide any relevant information regarding this phenomena.

---

Your answer seems to disagree with me, but displays a disjointed understanding of what I'm really addressing.

---

As a lighthearted fun analogy, I present:

https://isotropic.org/papers/chicken.pdf

The paper does not prove the existence of chickens. It says chicken a lot, but never addresses the phenomena of chickens existing.

imiric · 5h ago
I'm confused by what your point is, then. You want evidence of an abstraction that exists in the minds of experienced developers? That's like asking for evidence of humor or love. We accept these things as real because of shared experiences, not because of concrete evidence.
alganet · 5h ago
My point is that the paper has no point, the article on the paper is a stretch, and none of this is relevant in any way except creating chatter.

It's useless from the research perspective. But it is a cup-holder for marketing something.

I already laid this out very clearly in my first comment.

cratermoon · 7h ago
bunderbunder · 7h ago
> It's a really fabulous study...

Ehhhh... not so much. It had serious design flaws in both the protocol and the analysis. This blog post is a fairly approachable explanation of what's wrong with it: https://www.argmin.net/p/are-developers-finally-out-of-a-job

narush · 7h ago
Hey, thanks for linking this! I'm a study author, and I greatly appreciate that this author dug into the appendix and provided feedback so that other folks can read it as well.

A few notes if it's helpful:

1. This post is primarily worried about ordering considerations -- I think this is a valid concern. We explicitly call this out in the paper [1] as a factor we can't rule out -- see "Bias from issue completion order (C.2.4)". We have no evidence this occurred, but we also don't have evidence it didn't.

2. "I mean, rather than boring us with these robustness checks, METR could just release a CSV with three columns (developer ID, task condition, time)." Seconded :) We're planning on open-sourcing pretty much this data (and some core analysis code) later this week here: https://github.com/METR/Measuring-Early-2025-AI-on-Exp-OSS-D... - star if you want to dig in when it comes out.

3. As I said in my comment on the post, the takeaway at the end of the post is that "What we can glean from this study is that even expert developers aren’t great at predicting how long tasks will take. And despite the new coding tools being incredibly useful, people are certainly far too optimistic about the dramatic gains in productivity they will bring." I think this is a reasonable takeaway from the study overall. As we say in the "We do not provide evidence that:" section of the paper (Page 17), we don't provide evidence across all developers (or even most developers) -- and ofc, this is just a point-in-time measurement that could totally be different by now (from tooling and model improvements in the past month alone).

Thanks again for linking, and to the original author for their detailed review. It's greatly appreciated!

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

bunderbunder · 6h ago
Thanks for the response, you make some very points. Sorry, I had missed your response on the original post. I don't know if it was there yet, or because for some reason their blog is configured to only show the first two comments by default. :/ Either way, my bad.

I think my bias as someone who spends too much time looking at social science papers is that the protocol allows for spillover effects that, to me, imply that the results must be interpreted much more cautiously than a lot of people are doing. (And then on top of that I'm trying to be hyper-cautious and skeptical when I see a paper whose conclusions align with my biases on this topic.)

Granted, that sort of thing is my complaint about basically every study on developer productivity when using LLMs that I've seen so far. So I appreciate how difficult this is to study in practice.

d00mB0t · 7h ago
Blasphemy! How dare you say our Emperor has no clothes! AI is becoming a cult and I'm not here for it.
gr8beehive · 7h ago
Mirror neurons got people drinking the same stupid kool aid without realizing it.
whatever1 · 7h ago
They didn’t use the latest model that was released yesterday night. Follow my paid course to learn how to vibe code/s
rosspackard · 7h ago
One mediocre paper/study (it should not even be called that with all the bias and sample size issues) and now we have to put up with stories re-hashing and dissecting it. I really hope these don't get upvoted more in the future.

16 devs. And they weren't allowed to pick which tasks they used the AI on. Ridiculous. Also using it on "old and >1 million line" codebases and then extrapolating that to software engineering in general.

Writers like this then theorize why AI isn't helpful, then those "theories" get repeated until it feels less like a theory and more like a fact and it all proliferates into an echo chamber of AI isn't a useful tool. There have been too many anecdotes and my own personal experience to ignore that it isn't useful.

It is a tool and you have to learn it to be successful with it.

davidcbc · 7h ago
> And they weren't allowed to pick which tasks they used the AI on.

They were allowed to pick whether or not to use AI on a subset of tasks. They weren't forced to use AI on tasks that don't make sense for AI

throwaway284927 · 7h ago
That is not true, usage of AI was decided randomly. From the paper:

"To directly measure the impact of AI tools on developer productivity, we conduct a randomized controlled trial by having 16 developers complete 246 tasks (2.0 hours on average) on well-known open-source repositories (23,000 stars on average) they regularly contribute to. Each task is randomly assigned to allow or disallow AI usage, and we measure how long it takes developers to complete tasks in each condition."

davidcbc · 7h ago
Directly from the paper:

> If AI is allowed, developers can use any AI tools or models they choose, including no AI tooling if they expect it to not be helpful. If AI is not allowed, no generative AI tooling can be used.

AI is allowed not required

throwaway284927 · 6h ago
True, my bad, I didn't read you correctly. What you said was true.

I do believe however that it's important to emphasize the fact that they didn't got to choose in general, though, which I think your wording (even though it is correct) does not make evident.

rosspackard · 7h ago
Half the tasks they were not allowed to use AI.
davidcbc · 6h ago
Yes, and the other half they had the option to use AI. That's why I said they were allowed to pick whether or not to use AI on a subset of tasks. On the other subset they were not allowed to use AI.
RamblingCTO · 7h ago
It's just the same with all the anecdotal evidence of some hype guys on twitter claiming 10x performance on coding ... Same same but different
steveklabnik · 7h ago
> and then extrapolating that to software engineering in general.

To the credit of the paper authors, they were very clear that they were not making a claim against software engineering in general. But everyone wants to reinforce their biases, so...

rosspackard · 7h ago
Great for the authors. But everyone else seems to be extrapolating. Authors have a responsibility and should recognize how their work will be used.

Metr may overall have an ok mission, but their motivation is questionable. They published something like this to get attention. Mission accomplished on that but they had to have known how this would be twisted.

jplusequalt · 7h ago
>One mediocre paper/study (it should not even be called that with all the bias and sample size issues)

Can you bring up any specific issues with the metr study? Alternatively, can you site a journal that critiques it?

rosspackard · 7h ago
It was just published. Too new for someone to conduct a direct study to critique and journals don't just publish critiques anyway. It would have to be a study that disputes the results.

They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure.

Veteran maintainers on projects they know inside-out. This is a bias.

Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value.

Time was not independently logged and was self-reported.

No possible direct quality metric is possible. Could the AI code be better?

The Hawthorne effect. Knowing they are observed paid may make devs over-document, over-prompt, or simply take their time.

Many of the devs were new to Cursor

Bias in forecasting.

mkagenius · 8h ago
AI tends to slow us down because we don't really know what it's good at. Can it write a proper Nginx config? I don't know—let's try. And then we end up wasting 30 minutes on it.

Fully autonomous coding tools like v0, a0, or Aider work well as long as the context is small. But once the context grows—usually due to mistakes made in earlier steps—they just can’t keep up. There's no real benefit of "try again" loop yet.

For now, I think simple VSCode extensions are the most useful. You get focused assistance on small files or snippets you’re working on, and that’s usually all you need.

ethan_smith · 7h ago
The context switching cost between coding and AI interaction is substantial and rarely measured in these studies. Each prompt/review cycle breaks flow state, which is particularly damaging for complex programming tasks where deep concentration yields the greatest productivity.
bluefirebrand · 6h ago
This has been my experience too

Ever since my company made switching to Cursor mandatory, I have not been able to hit any kind of flow. I know my own productivity has plummeted and I suspect many others are as well, but no one is saying anything

I have spoken up once or twice and only been smacked down for my troubles, so I am not surprised everyone else is clammed up