"Despite this plethora of negative experiences, executives are aggressively mandating the use of AI6. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.
Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.
But, at least in that scenario, the thing ultimately doesn’t work, so there’s a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down."
The counterpoint to this is that _SOME_ people are able to achieve force multiplication (even at the highest levels of skill, it's not just a juniors-only phenomenon), and _THAT_ is what is driving management adoption mandates. They see that 2-4x increases in productivity are possible under the correct circumstances, and they're basically passing down mandates for the rank and file to get with the program and figure out how to reproduce those results, or find another job.
jmsdnns · 21h ago
> even at the highest levels of skill, it's not just a juniors-only phenomenon
AI has the most effect for people with less experience or low performance. It has less of an effect for people on the high end. It is indeed closing the skill gap and it does so by elevating the lower side of it.
This is important to know because it helps explain why people react as they do. Those who feel the most lift will be vocal about AI being good while those that don't are confused by anyone thinking AI is helpful at all.
It is not common for people on the high skill side to experience a big lift except for when they use AI for the tedious stuff that they don't really want to do. This is a sweetspot because all of the competence is there, but the willingness to do the work is not.
I have heard Dr Lilach Mollick, dir of Pedagogy at Wharton, say this has been shown numerous times. People who follow her husband, Ethan, are probably aware already.
mrweasel · 21h ago
> It is indeed closing the skill gap and it does so by elevating the lower side of it.
That's my "criticism", it's not closing the skill gap. Your skills haven't change, your output has. If you're using AI conservatively I'd say you're right, it can remove all the tedious work, which is great, but you'll still need to check that it's correct.
I'm more and more coming to the idea that for e.g. some coding jobs, CoPilot, Claude, whatever can be pretty helpful. I don't need to write a generic API call, handle all the error codes and hook up messages to the user, the robot can do that. I'll check and validate the code anyway.
Where I'm still not convinced is for communicating with other humans. Writing is hard, communication is worse. If you still have basic spelling errors, after decades of using a spellchecker, I doubt that your ability to communicate clearly will change even with an LLM helping you.
Same with arts. If you can't draw, no amount of prompting is going to change that. Which is fine, if you only care about the output, but you still don't have the skills.
My concern is the uncritical application of LLMs to all aspects of peoples daily life. If you can use an LLM to do your job faster, fine. If you can't do it without an LLM, you shouldn't be doing it with one.
jaredklewis · 20h ago
> I have heard Dr Lilach Mollick, dir of Pedagogy at Wharton, say this has been shown numerous times. People who follow her husband, Ethan, are probably aware already.
I'd be curious to see the sources.
Basically every study I have ever read making some claim about programming (efficacy of IDEs, TDD, static typing, pair programming, formal CS education, ai assistants, etc...) has been a house of cards that falls apart with even modest prodding. They are usually premised on one or more inherently flawed metrics like number of github issues, LoC or whatever. That would be somewhat forgivable since there are not really any good metrics to go on, but then the studies invariably make only a perfunctory effort to disentangle even the most obvious of confounding variables, making all the results kind of pointless.
Would be happy if anyone here knew of good papers that would change my mind on this point.
nyarlathotep_ · 16h ago
> Basically every study I have ever read making some claim about programming (efficacy of IDEs, TDD, static typing, pair programming, formal CS education, ai assistants, etc...) has been a house of cards that falls apart with even modest prodding.
Isn't this true about most things in software? I mean is there anything quantifiable about "microservices" vs "monolith"? Test-driven development, containers, whatever?
I mean all of these things are in some way good, in some contexts, but it seems impossible to quantify benefits of any of them. I'm a believer that most decisions made in software are somewhat arbitrary, driven by trends and popularity and it seems like little effort is expended to come to overarching, data-backed results. When they are, they're rare and, like you said, fall apart under investigation or scrutiny.
I've always found this strange about software in general. Even every time there's a "we rewrote $THING in $NEW_LANG" and it improved memory use/speed/latency whatever, there's a chorus of (totally legitimate) criticism and inquiry about how things were measured, what attempts were made to optimize the original solutions, if changes were made along the way outside of the language choice that impacted performance etc etc.
jaredklewis · 15h ago
I think we agree.
To be clear I am not arguing that tools and practices like TDD, microservices, ai assistants, and so on have no effect. They almost certainly have an effect (good or bad).
It’s just the unfortunate reality that quantitatively measuring these effects in a meaningful way seems to basically be impossible (or at least I’ve never see it done). With enough resources I can’t think of any reason it shouldn’t be possible, but apparently those resources are not available because there are no good studies on these topics.
Thus my skepticism of the “studies” referenced earlier in the thread.
surgical_fire · 20h ago
> AI has the most effect for people with less experience or low performance. It has less of an effect for people on the high end.
I actually think that it benefits high performance workers as AI can do a lot of heavy lifting that frees them to focus on things where their skills make a difference.
Also, for less skilled or less experienced developers, they will have a harder time spotting the mistakes and inconsistencies generated by AI. This can actually become a productivity sink.
windows2020 · 20h ago
Reading other people's code is often more challenging than writing it yourself.
mewpmewp2 · 18h ago
With AI code I find it very easy to understand as I prompted it, I know what to expect, I know what it will likely do. Far easier than other people code.
jessoteric · 20h ago
the main issue is that you end up looking down the barrel of begging claude, for the fifth time this session, to do it right- or just do it yourself in half the total time you've wasted so far.
at least, this is what i typically end up with.
surgical_fire · 17h ago
Typically, I've been asking it to do "heavy lifting" for me.
It generally generates defective code, but it doesn't really matter all that much, it is still useful that it is mostly right, and I only need to make a few adjustments. It saves me a lot of typing.
Would I pay for it? Probably not. But it is included in my IntelliJ subscription, so why not? It is there already.
CuriouslyC · 21h ago
So, basically, you think all the pro-AI folks are "bad," and defensive because they feel like anti-AI folks are attacking the thing that makes them not bad? Hard to want to engage with that.
I'll bite anyhow.
AI is very, very good at working at short length-scales. It tends to be worse at working at longer length-scales (Gemini is a bit of an outlier here but even so, it holds). People who are hyper-competent/elite-skill in their domain who achieve force multiplication with gen-AI understand this, and know how to decompose challenging long length-scale problems into a number of smaller short-length scale problems efficiently. This isomorphic transform allows AI to tackle the original problem in a way that it's maximally efficient at, thus side-stepping their inherent weaknesses.
You can think of this sort of like mathematical transformations that make data analysis easier.
johnnyanmac · 15h ago
>So, basically, you think all the pro-AI folks are "bad," and defensive because they feel like anti-AI folks are attacking the thing that makes them not bad?
I break the pro-AI crowd into 3 main categories and 2 sub categories:
1. those who don't really know how to code, but AI lets them output something more than what they could do on their own. This seems to be what the GP is focused on
2. The ones who can code but are financially invested to hype up the bubble. Fairly self explanatory; the market is rough and if you're getting paid the big bucks to evangelize, it's clear where the interests lie.
3. Executives and product teams that have no actual engagement with AI, but know bringing it up excites investors. a hybrid of 1 and 2, but they aren't necessarily pretending they use it themselves. It's the latest means to an end (the end being money).
and then the smaller sects:
1. those who genuinely feel AI is the future and is simply prepping for it and trying to adapt their workflow and knowledged based around it. They may feel it can already replace people, or may feel it's a while out but progressing that way. These are probable the most honest party, but I personally feel they miss a critical aspect: what is used currently as the backbone for AI may radically change by the time it is truly viable.
2. those who are across the spectrum of AI, but see it as a means to properly address the issue of copyright. If AI wins, they get their true goal of being able to yoink many more properties without regulations to worry about.
>People who are hyper-competent/elite-skill in their domain who achieve force multiplication with gen-AI understand this,
are their real examples of this? The main issue I see is that people seem to judge "competency" based on speed and output. But not on the quality, maintainability, nor conciseness of such output. If we just needed engineers to slap together something that "works", we could be "more productive".
CuriouslyC · 5h ago
Well, just to give you context on my position, because I don't feel I fit into any of those molds:
I was already a very high performer before AI, leading teams, aligning product vision and technical capabilities, architecting systems and implementing at top-of-stack velocity. I have been involved in engineering around AI/ML since 2008, so I have pretty good understanding of the complexities/inconsistencies of model behavior. When I observed the ability of GPT3.5 to often generate working (if poorly written, in general) code, I knew this was a powerful tool that would eventually totally reshape development once it matured, but that I had to understand its capabilities and non-uniform expertise boundary to take advantage of its strengths without having to suffer its weaknesses. I basically threw myself fully into mastering the "art" of using LLMs, both in terms of prompting and knowing when/how to use them, and while I saw immediate gains, it wasn't until Gemini Pro 2.5 that I saw the capabilities in place for a fully agentic workflow. I've been actively polishing my agentic workflow since Gemini 2.5's release, and now I'm at the point where I write less than 10% of my own code. Overall my hand written code is still significantly "neater/tighter" than that produced by LLMs, but I'm ok with the LLM nailing the high level patterns I outline and being "good enough" (which I encourage via detailed system prompts and enforce via code review, though I often have AI rewrite its own code given my feedback rather than manually edit it).
I liken it to assembly devs who could crush the compiler in performance (not as much of a thing now in general, but it used to be), who still choose to write most of the system in c/c++ and only implement the really hot loops in assembly because that's just the most efficient way to work.
latchup · 3h ago
> I was already a very high performer before AI, leading teams, aligning product vision and technical capabilities, architecting systems and implementing at top-of-stack velocity
Indeed, he did not list "out-of-touch suit-at-heart tech leads that failed upwards and have misplaced managerial ambitions" as a category, but that category certainly exists, and it drives me insane.
CuriouslyC · 3h ago
You're making a lot of baseless assumptions with your implication there, and sadly it says more about you than me.
You might find your professional development would stop being retarded if you got the chip off your shoulder and focused on delivering maximum value to the organizations that employ you in any way possible.
mewpmewp2 · 18h ago
Yeah, exactly, shorter files, good context for the AI agents, good rules for the AI agents how to navigate around the codebase, reuse functions, components, keep everything always perfectly organized, with no effort from the person themselves. It is truly amazing to witness. No people can match that ability to reuse and organize.
seadan83 · 21h ago
Burning the furniture in your home is a force multiplier for your homes heating system. Get with the program, or find a new home.
CuriouslyC · 21h ago
If your argument is that agentic coding workflows will retard the skill development of junior and midrange engineers I'm not entirely in disagreement with you, that's a problem that has to be solved in relation to how we use gen AI across every domain.
seadan83 · 20h ago
Skill retardation is actually beyond the point. I'm largely raising a counter example for why the following (rough paraphrasing) is not sound: "SOME people figured out how to use these tools to go 2x to 4x faster, you do the same, or you're fired!".
Let's say "n" is the sum complexity of a system. While some developers can take an approach that yields a development output of: (1.5 * log n), the AI tools might have a development output of: (4 * log n)^4/n. That is, initially more & faster, but eventually a lot less and slower.
The parable of the soviet beef farmer comes to mind: In this parable, the USSR mandated its beef farmers increase beef output YoY by 20%, every year. The first year, the heroic farmer improved the health of their livestock, bought a few extra cows and hit their target. The next year, to achieve 20% YoY, the farmer cuts every corner and maximizes every efficiency, they even exchange all their possessions to buy some black market cows. The third year, the farmer can't make the 20% increase, they slaughter almost all of their herd. The fourth year, the farmer has essentially no herd, they can't come close to their last years output - let alone increase it. So far short of quota, the heroic beef farmer then shot himself.
(side-note: Which is also analagous to people not raising their skill levels too, but not my main point - I'm more thinking about how development slows down relative to the complexity and size of a software system. The 'not-increasing skills' angle is arguably there too. The main point is short term trade-offs to achieve goals rather than selecting long term and sustainable targets, and the relationship of those decisions to a blind demand to increase output)
So, instead of working on the insulation of the home, instead of upgrading the heating system, to heat the home faster we burn the furniture. It works.. to a point. Like, what happens when you run out of furniture, or the house catches fire? Seemingly that will be a problem for Q2 of next year, for now, we are moving faster!!
I think this ties into the programming industry quite heavily from the perspective where managers often want things to work just long enough for them to be promoted. Doesn't have to work well for years, doesn't have to have the support tools needed for that, nope - just long enough that they can get the quarterly reward and then move on to not worry about the support mess left behind. To boot too, the feedback cycle for whether something was a good idea in software or not is slow, oftentimes years. AI tools have not been out for a long time, just a couple years themselves, it'll be another few before we see what happens when a system is grown to 5M lines through mostly AI tooling and the codebase itself is 10 years old - will that system be too brittle to update?
FWIW, I'm of the point of view that quality, time and cost are not an iron triangle - it is not a choose two situation. Instead, quality is a requirement for low cost and low time. You cannot move quickly when quality is low (from my experience, the slowdown of low quality can manifest quickly too - on the order of hours. A shortcut taken now can reduce velocity just even later that same day).
Thus, mandates from management to move 2x to 4x faster, when it's not clear that AI tools actually deliver 2x to 4x benefits over the longer term (perhaps not even in the shorter term), feels a lot like the soviet beef farmer parable, or burning furniture to stay warm.
CuriouslyC · 20h ago
If your AI scaling statement is accurate then the problem will eventually solve itself as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
My experience so far is that if you architect your systems properly AI continues to scale very well with code base size. It's worth noting that the architecture to support sustained AI velocity improvement may not be the architecture that some human architects have previously grown comfortable with as their optimal architecture for human productivity in their organization. This is part of the learning curve of the tools IMO.
seadan83 · 19h ago
Second angle:
> If your AI scaling statement is accurate then the problem will eventually solve itself as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
I find one of the biggest differences between junior engineers and seniors is they think differently about how complexity scales. Juniors don't think about it as much and do very well in small codebases where everything is quick. They do less well when the complexity grows and sometimes the codebase just simply falls over.
It's like billiards. A junior just tries to make the most straight forward shot and get a ball in the hole. A senior does the same, but they think about where they will leave the cue ball for the next shot, and they take the shot that leaves them in a good position to make the next one.
I don't see AI as being able to possess the skills that a senior would have to say "no, this previous pattern is no longer the way we to do things because it has stopped scaling well. We need to move all of these hardcoded values to database now and then approach the problem that way." AFAIK, AI is not capable of that at all, it's not capable of a key skill of a senior engineer. Thus, it can't build a system that scales well with respect to complexity because it is not forward thinking.
I'll posit as well that knowing how to change a system so that it scales better is an emergent property. It's impossible to do that architecture up front, therefore an AI must be able to say "gee, this is not going well anymore- we need to switch this up from hardcoded variables to a database - NOW; before we implement anything else." I don't know of any AI that is capable of that. I could agree that when that point is reached, and a human starts prompting on how to refactor the system (which is a sign the complexity was not managed well) - then it's possible to reduce the interest cost of outsized complexity by then using an AI to start managing the AI induced complexity...
johnnyanmac · 15h ago
>as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
You're assuming organizations are operating with the goal of quality and velocity in mind. We saw that WFH made people more productive, and had higher quality of life. companies are still trying to enforce RTO as we speak. The productivity was deemed not worth it compared to other factors like real estate, management ego, and punishing the few who abused the priveledge.
We're in weird times and sadly many companies have mature tech by now. They can afford to lose productivity if it helps make number go up.
seadan83 · 19h ago
> If your AI scaling statement is accurate then the problem will eventually solve itself as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
All things being equal, I would agree. Things are not equal though. The slow down can manifest as: needing more developers for the same productivity, lots of new projects to do things like "break the AI monolith into microservices", all the things that a company needs to do when growing from 50 employees to 200 employees. Having a magicly different architecture is kinda just a different reality, too much chaos to always say that one approach would really be different. One thing though, it does often take 2 to 5 years before knowing whether the chosen approach was 'bad' or not (and why).
Companies that are trying to scale - almost no two are alike. So it'll be difficult to do a peer-to-peer comparison, it won't be apples to apples (and if so, the sample size is absurdly small). Did architecture kill a company, or bad team cohesion? Did good team cohesion save the company despite bad architecture? Did AI slop wind up slowing things down so much that the company couldn't grow revenue? Very hard to make peer-to-peer comparisons when the problem space is so complex and chaotic.
It's also amazing what people and companies can do with just sheer stubbornness. Facebook has over (I hear) 1000+ engineers just for their mobile app.
> My experience so far is that if you architect your systems properly AI continues to scale very well with code base size. It's worth noting that the architecture to support sustained AI velocity improvement may not be the architecture that some human architects have previously grown comfortable with as their optimal architecture for human productivity in their organization
I fear this is the start of a no-true-scotsman argument. That aside, what is the largest code base size you have reached so far? Would you mind providing some/any insight into the architecture differences for an AI-first codebase? Are there any articles or blog posts that I could read? I'm very interested to learn more where certain good architectures are not good for AI tooling.
CuriouslyC · 16h ago
Regarding architecture considerations:
AI likes modular function grammars with consistent syntax and interfaces. In practice this means you want a monolithic service architecture or a thin function-as-a-service architecture with a monolithic imported function library. Microservices should be avoided if at all possible.
The goal there is to enable straightforward static analysis and dependency extraction. With all relevant functions and dependencies defined in a single codebase or importable module, you can reliably parse the code and determine exactly which parts need to be included in context for reasoning or code generation. LLMs are bad at reasoning across service boundaries, and even if you have OpenAPI definitions the language shift tends to confuse them (and I think they're just less well trained on OpenAPI specs than other common languages).
Additionally, to use LLMs for debugging you want to have access to a single logging stream, where they can see the original sources of the logging statements in context. If engineers have to collect logs from multiple locations and load them into context manually, and go repo hopping to find the places in the code emitting those logging statements, it kills iteration speed.
Finally, LLMs _LOVE_ good documentation even more than humans, because the humans usually have the advantage of having business/domain context from real world interactions and can use that to sort of contextually fumble their way through to an understanding of code, but AI doesn't have that, so that stuff needs to be made as explicit in the code as possible.
The largest individual repo under my purview currently is around 250k LoC, my experience (with Gemini at least) is that you can load up to about 10k LoC functionally into a model at a time, which should _USUALLY_ be enough to let you work even in huge repos, as long as you pre-summarize the various folders across the repo (I like to put a README.md in every non-trivial folder in a repo for this purpose). If you're writing pure, functional code as much as possible you can use signatures and summary docs for large swathes of the repo, combined with parsed code dependencies for stuff actively being worked on, and instruct the model to request to get full source for modules as needed, and it's actually pretty good about it.
seadan83 · 13h ago
The note on LLMs loving documentation is a solid gem, amongst others - thank you for the response.
kamaal · 11h ago
>>will retard the skill development
I worked at a Indian IT services firm which even until mid-2010's didn't give internet access to people at work. Their argument was that use of internet would make the developers dumb.
The argument always was assume you had internet outage for days and had to code , you would value your skills then. Well guess what its been decades now, and I don't think that situation has ever come to pass, heck not even something close to that has come to pass.
Sometimes how you do things changes. During the peak of Perl craze, my team lead often told me people who didn't use C++ weren't as smart and eventually people who used Perl would have their thinking skills atrophy when Perl wouldn't be around. That doomsday scenarios hasn't happened either. People have similar things about Java, IDEs, package managers, docker etc etc.
Businesses don't even care about these things. A real estate developer wants to sell homes, their job is not to build web sites. So as long as working site is available, they don't care who builds it, you or AI. Make whatever of this you will.
ritz_labringue · 21h ago
I very much agree, and I think people who are in denial about the usefulness of these tools are in for a bad time.
I've seen this firsthand multiple times: people who really don't want it to work will (unconsciously or not) sabotage themselves by writing vague prompts or withholding context/tips they'd naturally give a human colleague.
Then when the LLM inevitably fails, they get their "gotcha!" moment.
dingnuts · 20h ago
I think the people who are in denial about the uselessness of these tools are in for a bad time.
I've been playing with language models for seven years now. I've even trained them from scratch. I'm playing with aider and I use the chats.
I give them lots of context and ask specific questions about things I know. They always get things wrong in subtle ways that make me not trust them for things I don't know. Sometimes they can point me to real documentation.
gemma3:4b on my laptop with aider can merge a diff in about twenty minutes of 4070 GPU time. incredible technology. truly groundbreaking.
call me in ten years if they figure out how to scale these things without just adding 10x compute for each 1x improvement.
I mean hell, the big improvements over the last year aren't even to do with learning. Agents are just systems code. RAG is better prompting. System prompts are just added context. call me when GPT 5 drops, and isn't an incremental improvement
Fraterkes · 21h ago
If these force-multipliers actually exist, where is the deluge of amazing, useful programs created by these superhuman programmers?
ryandrake · 21h ago
I would love to see even a semi-scientific study where two companies are funded with the same amount of capital to build the same product, one extensively using AI tools and the other not. Then after some set amount of time, the resulting products are compared across measures of quality, profitability, customer satisfaction, and so on.
Seb-C · 7h ago
Hopefully the market forces will tell us that sooner or later. It might just take a while until the goldrush of VC money stops and the reality kicks back-in.
yubblegum · 15h ago
> useful programs
Code =/= Product should be kept in mind. That said, I do not have a hard position on the topic, though am certain about detrimental and generational skill atrophy.
Groxx · 21h ago
Short term increases have pretty much always been possible - give up quality.
A lot of the reluctance to bulk adoption is that it seems to drag quality down. CEOs don't usually see that until it's far too late though.
jaredcwhite · 16h ago
"force multiplication"
This is a buzzword, this isn't a thing. Just like the 10x developer was never a thing.
CuriouslyC · 16h ago
10x developers are a thing, but it's very context dependent. Put John Carmack in a room with your average FAANG principle and ask them both to build a 3D game engine with some features above and beyond what you can get out of the box with OpenGL and I would be very surprised if John didn't 10x (or more) his competition.
AI as a force multiplier is also a thing, with well structured codebases one high level dev can guide 2-3 agents through implementing features simultaneously, and each of those agents is going to be outputting code faster than your average human. The human just needs to provide high level guidance on how the features are implemented, and coach the AI on how to get unstuck when they inevitably run into things they're unable to handle.
The mental model you want for force multiplication in this context is a human lead "managing" a team of AI developers.
player1234 · 19m ago
Show us the numbers, what company? What gains? How did you measure it? If it is just fan fiction, simply fu.
antithesizer · 21h ago
>or find another job.
Exactly. And if you consider AI to be the inevitable source of unprecedented productivity gains then this filtering of employees by success with/enthusiasm for AI makes sense.
the_mitsuhiko · 23h ago
I left this comment originally on lobsters:
While I understand the fear, I don’t really share it. And if I where to go to the root of it, I think I really most take issue with this:
> My experiences of genAI are all extremely bad, but that is barely even anecdata. Their experiences are neutral-to-positive. Little scientific data exists. How to resolve this?
My experience is astonishingly positive. I would not have imagined how much of a help these tools have become. Deep research and similar tools alone have helped me navigate complex legal matters recently for my incorporation, they have uncovered really useful information that I just would not have found that quickly. First cursor, now Claude Code have really changed how I work. Especially since for the last month or so, I feel myself more and more in the position where I can do things while the machine works. It’s truly liberating and it gives me a lot of joy. So it’s not “neutral-to-positive” to me, it’s exhilarating.
And that extends particularly to this part:
> Despite this plethora of negative experiences, executives are aggressively mandating the use of AI. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.
When I was at Sentry the adoption of AI happened by ICs before the company even put money behind it. In fact my memory is that if anything only at the point where an exceeding number of AI invoices showed up from IC expenses did we realize how widespread adoption has been. This was grounds up. For my non techy friends it’s even tricker because some of them work in companies that outright try to prevent the adoption of AI, but they are paying for it themselves to help them with the work. Some of them pay for the expensive ChatGPT package even! None of this should be disregarded, but it stands in a crass contrast to what this post says.
That said, I understand where Glyph comes from and I appreciate that point. There is probably a lot of truth in the tail risks of all of that, and I share these. It just does not at all take away from my enjoyment and optimism at all.
bluefirebrand · 23h ago
> I can do things while the machine works
I find this workflow makes my life extremely tedious, because reviewing and fixing whatever the machine produces is a slog for me
I suppose it would be exhilarating if I just trusted the output but somehow I just can't bring myself to do that
How do you reconcile this with your own work? Maybe you just skim the output? Or do you run it and test that way?
Please don't tell me you just trust the AI to write automated tests or something...
osigurdson · 22h ago
In my experience, the machine works for 10-30s. I think the only productive use of time during those gaps is meditation. Still however, flow state cannot be achieved. Vibe coding, for me is pretty horrible for anything that might approximate production code. It is useful for quick prototyping or used as a learning tool. I assume it will get better but a far more satisfying experience would be for it to work offline, creating PRs for incremental improvements that are easy to review. I suppose this is something like the "Devin" model but I haven't used this due to the price and have heard bad things about it.
nojs · 18h ago
> I think the only productive use of time during those gaps is meditation
Check out several copies of the repo and work on different branches concurrently.
osigurdson · 15h ago
I suspect the context switches would kill productivity. This is not where the puck is going in my opinion.
tuckerman · 22h ago
Not GP but
> I find this workflow makes my life extremely tedious, because reviewing and fixing whatever the machine produces is a slog for me
This is a matter of personal preference, but reviewing code (from humans) was already a huge chunk of my job and one that I enjoy. Now I can have a similar workflow for side projects, especially for things that I find less enjoyment in coding myself.
> Please don't tell me you just trust the AI to write automated tests or something...
Like most tools, the more I use LLMs for coding the more I get a feel for what its good at, what its okay at, and what it will just make a mess of. I find for changes like extending existing test coverage before refactoring, I can often just skim the output. For writing tests from whole cloth it requires a lot more attention (depending on the area). For non-test code, it very much depends on how novel the work is and how much context/many examples it has to go on.
CuriouslyC · 22h ago
Create stubs to ensure code structure roughly aligns with expectations, then tell the AI to generate code and paired tests and look at code test coverage and manually inspect the tests rather than inspecting the code directly.
the_mitsuhiko · 23h ago
> I find this workflow makes my life extremely tedious, because reviewing and fixing whatever the machine produces is a slog for me
I don't actually mind the code review process, it's what I'm used to with open source contributions. Not infrequently did I look at a PR that people left me and decide to rewrite it entirely anyways because I did not quite appreciate the method they chose. That however did not mean the PR was useless, it might have provided the necessary input that made me go down that path ultimately.
I also think that what can make code review enjoyable is not necessarily because the person on the other side learns something from it. I actually tend to think that this something that does not necessarily happen in PR review, but in 1:1 conversations in person or though more synchronous collaboration.
So really my answer is that I do not mind code review. That said there is an important part here:
> How do you reconcile this with your own work? Maybe you just skim the output? Or do you run it and test that way?
What makes agentic workflows work is good tooling and fast iteration cyclces. That is also exactly what a human needs for a PR review to be actually thorough. I cannot accept a change when I do not know that it exhibits the intended behavior, and to validate that behavior I usually need a test case. Do I need to run the test case myself? Not necessarily because the agent / CI already does for me. However I do sometimes run it with verbose output to also see the intermediate outputs that might otherwise be hidden from me to give me more confidence.
It really is hard to describe, but when you put the necessary effort in to make an agentic workflow work, it's really enjoyable. But it does require some upfront investment for it to work.
> Please don't tell me you just trust the AI to write automated tests or something...
I do in fact trust the AI a lot to write tests. They have become incredibly good at taking certain requirements definitions and derive something that can be mostly trusted. It does not always work, but you get a pretty good feeling for if it will or not. At the end of the day many of the tasks we do are basic CRUD or data transformation tasks.
starkparker · 22h ago
> It really is hard to describe, but when you put the necessary effort in to make an agentic workflow work, it's really enjoyable.
I wish it wasn't hard to describe because every attempt I've made to reach that point with agents/tools-in-a-loop has ended with more frustration, more errors, and more broken code than when I started, even with common and small, tedious tasks. I'd very much like to understand what proponents are doing and seeing that works.
NoOn3 · 22h ago
It would be interesting to watch a video with an example of effective work. To see why are we so unlucky with results:)
solarwindy · 18h ago
All too often with AI proponents I get the you’re holding it wrong vibe. I can’t help but develop the suspicion these people are putting out bad work and just never notice. Or to be a little less scathing, fine, there’s a very narrow range of tasks where these models can usefully alleviate drudgery—quite often that comes at the cost of code bloat, since a generative model inherently makes it easy to puke out code with total indifference to its burden over time.
ookblah · 17h ago
there was literally a post a few days ago where someone from cloudflare or fly (can't remember) released a new feature built with AI (albeit heavily supervised) and answered q's along the way about it.
it's funny because what i'd also like to see is people who are skeptics make a video as well since sometimes i also have the opposite suspicion. i get a lot of the criticism, but i don't get the "it produces pure garbage" type ones.
solarwindy · 16h ago
My comment was admittedly a little inflammatory, so I’ll try to elaborate on what I find so exasperating about the whole experience of using LLMs for coding (in any form, chat, agent, etc.): at root, it’s their total lack of a model of computation.
This leads to mistakes that are difficult to catch, because while I know the kinds of mistakes another human might make, having been through the process of learning not to make them myself, LLMs produce whole classes of bizarre mistakes that I have no interest in learning to catch. There is no discernible flawed mental model behind these errors—which with a human could be discussed and corrected—just an opaque stochastic process which I can tediously try to set on a better course with ‘incantations’, attempting to dial in to a better part of the training data that avoids the relevant class of error.
It’s honestly amusing how much of ‘prompt engineering’ (if it can be dignified with that term) amounts to a modern-day kind of mysticism. What better can we really do though when these models’ structure and operation is utterly opaque, on the one hand through deliberate, commercially-oriented obfuscation, and on the other because we still just cannot explain how a multi-billion parameter model works.
It’s rewarding to work with human juniors because people actually learn and improve. My learning how to coax these models into producing better than trash just is not, especially on anything either remotely novel, or in a legacy codebase that requires genuine understanding of how an existing system functions at runtime.
Once those two ends of the spectrum are ruled out, I find there’s little left that an LLM can accomplish, without necessitating a loop of prompt refinement that leaves me feeling like a worse developer for not just having thought through the problem myself, and resentful of the time wasted.
it's really interesting to me because i feel like legacy code is where it actually excels very well. i have a large mental map in my head, so i usually just feed it some vague notion of what i want it do and use existing files as a reference ("i want to build this feature, look at X, Y, Z for reference. approach it this way" if i have some notion of how i want it go).
i usually don't let it go run off on its own unless it's a very defined task that i can review quickly later, i just review and approve every change and it takes big cognitive load off for me. at some point maybe this doesn't feel like "programming", but then i'll just tweak something else or modify it and then go onto the next review. i find i can't have it produce the entire thing and then review it since i have no idea how it got to where it did or takes just as much time to understand. but doing it this way it's faster + i gain understanding.
the prompts aren't overly complex or take a lot of time, certainly way less than speccing something out for a junior. all i have a is a base file for style and structure and then i describe the general problem and reference files ad-hoc. where i find it actually fails a lot is in novel code because it has nothing to ground it and starts exploring random stuff. i only use it for novel exploration to see what approaches it comes up with.
still trying to understand why there's this huge chasm between the two viewpoints. like a lot of the things you just said i can't resonate at all with. like maybe the 20% i feel like i'm "fighting" the LLM i just stop and go in myself. does that suck? sort of, but it's certainly way less tedious than directing some other person to do it or the time saved had i not used it at all.
edit: but to your point, yeah it really is just like magic with no way to like actually direct it in a way you would where a human would learn. maybe over a year ago i tried and wrote anything AI off beyond basic co-pilot completions (same issues, "fighting" the AI, having to specify a tons of exceptions in some god awful file). the new agents changed everything for me, esp claude code. i think it will only get better, so it's best to pick it up.
my only fears are
1) no juniors being trained, thus no future seniors. part of the power is that you have experienced people using it to enhance their context or understanding. for those with no experience and no drive to "improve" (honestly, think of the avg dev at big co) or straight up "vibe coding" i shudder at the output.
my hypothesis we are now going to enter a period where a LOT of shitty code is going to be created. it's already happening in education with people just cheating w/o learning. i already had issues trying to hire people who were using AI to get past initial exercises but failing on complex issues because they were just probably copy and pasting everything. best time to be a nimble startup.
2) top-down mandates to use this stuff. you should only use it when you want and if it helps you. i think there's this element of companies buying into the hype 110% and that puts a bad taste in everyone's mouth. "all devs replaced in 1 year!" type stuff.
solarwindy · 7h ago
Hmm, I should say that in the legacy case, it’s only been with legacy I’m already (painfully) familiar with, and knowing how difficult that code has been before to disentangle or to carefully stack another card on the already teetering house, I know upfront it’ll be a waste of time to let the LLM try.
In legacy that I don’t yet understand, maybe I am missing something in using a model with all that code in its context as an aid to build my understanding. I just cannot imagine handing off responsibility of modifying that code to the mystery machine. Tedious as it might be, I’m firmly of the view that I’d do myself and my team a disservice if I introduce changes I don’t fully comprehend, in fear that it’d only bite us later.
That fear comes from a couple cases of being bitten by changes made by other supposed seniors where I had my suspicions they let the LLM do the work for them, and went against my judgement in accepting their assurances that they’d tested everything worked.
Although, OK, for the legacy case, perhaps I should loosen my embargo in legacy frontend code where there’s a tight enough ‘blast radius’—meaning, the damage of a bad change is constrained to the bit of UI in question not working. Especially if that’s back-office frontend code where I have responsive users who I know will let me know if I broke something (because they surely know the warts of the system inside out, unlike me). In mission-critical backend code? Not a chance.
On the question of ‘fighting’ the LLM, maybe I do need to loosen up in how I want something done. In fairness I’m much more tolerant of that in a human, because a) there’s just often more than one way to do things; and b) with some devs I know there would or could be a fight that just isn’t worth it.
Which does come to an interesting point about ego and ownership, that if I regard the LLM code less as ‘my own’ perhaps I’d be more forgiving of its contributions. Would honestly make a difference if it’s not got my name on the commit.
Also comes back to one of the cases where I got burned. If the other dev hadn’t put their name on code which I’m sure was not theirs, then we could have had an honest discussion about it, and I could have better helped figure out whether what seemed to work was really trustworthy. Instead I had to weigh challenging them on it, and the awkward implication that I didn’t think they could have come up with the code themself.
To your two fears, totally agree, and undoubtedly some variant on these two cases has put a very bad taste in my mouth. Witnessing a junior essentially cheat their way through their final project for school, making a mockery of the piece of paper they got at the end. Being a victim not quite of a top-down mandate—somehow worse, with an exec head-over-heels bought into the hype, thinking they could lose a chunk of expensive headcount to no ill effect; not firing people, just making the situation miserable enough that people quit.
jmsdnns · 21h ago
I'm curious to understand the nature of what you're working on better.
Whether or not AI performs well is influenced both by the work you're doing and how experienced you are in it. AI performs better when the work is closer to mainstream work than novel work. It also performs better with lower level instructions, eg. being more specific. As for experience, there are two things: people with less experience get a bigger lift than those with lots of experience, or people wiht lots of experience get a lift by having AI doing the work they don't want to do, which is often unittests and comments or writing the bobdyllionth api endpoint.
I read your post the other day and appreciated the optimism, but I wasn't able to work out what kind of work you've been doing since leaving Sentry.
cdavid · 21h ago
The issue about executive mandate is likely coming from the context of large corporations. It creates fatigue, even though the underlying technology can be used very effectively to do "real work". It becomes hard for people to really see where the tech is valuable (reduce cost to test ideas, accelerate getting into a new area, etc.) vs where it is just a BS generator.
Those are typical dysfunctions in larger companies w/ weak leadership. They magnified by a few factors: AI is indistinguishable from magic for non tech leadership, demos that can be slapped quickly but that don't actually work w/o actual investments (which was what leadership wanted to avoid in the first place), and ofc the promise to reduce costs.
This happens in parallel to people using it to do their own work in a more bottom up manner. My anecdotal observation is that it is overused for low-value/high visibility work. E.g. it replaces "writing as proof of work" by reducing the cost to write bland, low information, documents used by middle management, which increases bureaucratic load.
vouaobrasil · 23h ago
> Deep research and similar tools alone have helped me navigate complex legal matters recently for my incorporation,
On the flipside, it will make it even easier for corporations to use the legal system in their favour and corporations will more easily and effectively use GenAI against individuals even if both have it due to the nature of corporations and their ability to invest in better tools than the individual.
So it's just an arms race/prisoner's dilemma and while it provides incremental advantages, it makes everything worse because the system becomes more complex and more oppressive.
dyauspitr · 23h ago
I don’t understand these people that say they are getting no value from LLM. I’m getting value from them probably every other hour.
recursive · 20h ago
Sometimes I try to do stuff with AI. Results are mixed. Sometimes it's somewhat helpful. Sometimes it's a complete waste of time. Maybe that means I'm holding it wrong. In the end, it's probably a wash for me. Maybe slightly positive.
I don't understand these people that are saying they are getting huge value from LLMs. I haven't put a ton of effort into figuring out how to "hold it right" because stuff is changing so fast still. I've had bad experiences with rabbit holes like this where the true believers will tell me the pot of gold is always right around the next corner. If and when I can get some positive value out it using whatever's easily available on the default, then I might investigate deeper.
My experience mostly consists of GPT and Copilot, as provided by my employer. I'm part of a pilot program to evaluate Copilot. My feedback is that it's slightly positive in agggregate, but individual results are very mixed. It's not worth much to me.
dingnuts · 20h ago
Simple, I bother to subtract all the time LLMs have wasted by giving me fake leads and lying to me from the time it saves when it successfully generates a few lines of code
jessoteric · 18h ago
This right here. Every time I talk to someone claiming 2-4x multipliers, after a few rounds of conversation they eventually admit "yeah you have to learn when to stop when it's no longer productive, you get a feel for when it's veering into wasting your time".
It can't be both. It can't be 2-4x multiplier _and_ be wasting your time to the extent that you have to "get a feel for when it has been wasting your time".
solarwindy · 18h ago
Having intermittently given in to the hype and tried whatever is supposedly latest and greatest, it without fail goes like this. I’ve even on occasion deceived myself into thinking I’ve finally hit on the right way to use these tools, only to realize in retrospect that under a complete accounting, the time spent to reach that productive mode of working in no way generalizes, so must be paid effectively every time.
oulipo · 23h ago
Basically his article says: check the documented issue with the climate, the education, the destruction of trust in society, and your answer is "yes but it makes my nice cuddly programming job better I like it"... don't you understand the issue?
the_mitsuhiko · 23h ago
I responded to quite specific quotes which I took issues with, and I intentionally did not talk about the rest of it. I did not make a statement about the rest because it's too complex of a topic to which I do not have enough data to contribute meaningfully to the conversation.
haswell · 21h ago
> I did not make a statement about the rest because it's too complex of a topic to which I do not have enough data to contribute meaningfully to the conversation.
> There is probably a lot of truth in the tail risks of all of that, and I share these. It just does not at all take away from my enjoyment and optimism at all.
On the one hand, I get not expanding on things you don't feel you can contribute to meaningfully and respect that.
On the other, isn't this a problem?
While I agree with you and cannot fully understand the author's conclusion that it's all bad (I've also found value), I wrestle with the downsides on an ongoing basis and have found myself shying away from using these tools as a result, even though I know they are quite capable of some things. I think that if I only responded "Well, I personally find the tools valuable", that this is only a part of the picture and is a net negative to the overall conversation because the "real" conversation that needs to be had IMO is about those harms, not haggling over whether or not the thing can actually do stuff.
There's quite a lot of pro-AI content and opinion floating around right now that amounts to: "Yeah, this thing is dangerous, threatens climate goals, is accelerating an education crisis, only exists because of mass theft, creates a massive category of new problems we don't yet know how to solve...but I'm getting value out of it". And the thing that keeps coming to mind is "Don't Look Up".
oulipo · 8h ago
> There is probably a lot of truth in the tail risks of all of that, and I share these. It just does not at all take away from my enjoyment and optimism at all.
...is just coded "white rich guy" speak to say "I won't suffer the consequences, so I might as well not bother"
shameful
oulipo · 8h ago
But DON'T YOU UNDERSTAND THIS IS THE ISSUE RIGHT HERE??
The fact that you're so sheltered as a western white rich programmer that you can AFFORD to be misinformed about all the way AI is bad for the rest of us (spoiler: you won't have to suffer it because you're sheltered)
That's exactly the issue: rich white guy pushing for "more, more, more" of their toys, without looking to the bad consequences of them because "oh no, I'm not interested and I'm not an expert"
Then you might as well keep taking the airplane every weekend, etc, and just "not get interested" or "not get enough data" about why it's bad...
grafelic · 21h ago
For me the problem with GenAI is that it erodes away what I enjoy in my job and in my free time.
I enjoy learning and eventually understanding methods to solve problems. I enjoy applying these methods in my mind, before trying them on the computer.
I am not interested in getting the solutions to problems served to me without understanding why it works.
I feel, yes it is an emotion, that GenAI is unhealthy for the progress of mankind, since it encourages a lazy mind, stifles curiosity and will lead to inbred information filling our digital spaces.
I am looking forward to my future career as a gardener (although with a smidge of sadness) when AI has sucked out all creativity, ingeniuity and enjoyment from my field of work.
ednite · 23h ago
Hats off to the author for expressing this so candidly, I really get where they’re coming from.
While I tend to lean more toward the side of AI progression, it’s clearly an inevitable part of society now. I do worry about the long-term impact. I also feel that constant pressure to keep up, adapt, and integrate AI into everything feels like it’s already contributing to fatigue and burnout.
Thoughtful post.
dingnuts · 20h ago
> it’s clearly an inevitable part of society now.
this won't be true until the financials are squared up. Right now there is a lot of funky math propping up these companies. If the bet doesn't pay off with real productivity gains the whole AI industry will disappear the way the first generation of AI assistants did ten years ago.
Remember Echo?
mholm · 19h ago
There's funky math propping up the training of new AI. That much is certainly true. But just running frontier models on hardware is very doable, and is already capable of absolutely changing the software landscape in the next decade, even if nobody ever trains a model again.
ednite · 19h ago
You make a solid point, the financials will ultimately decide how far this goes, as with most things in business.
That said, I don’t think it quite echoes Echo (that pun might be intentional) or maybe it does, financially speaking.
For me, this wave feels different: deeper integration, broader scale, and real-time utility. I agree, if the returns don’t hold up, a major correction isn’t off the table.
It reminds me a bit of the smartphone era, rapid adoption, strong dependence. I guess, the difference is, that phones had clear monetization paths. With AI, we’re still betting.
The bigger question for me is, if this all collapses, what happens to the workflows and investments already built around it?
lsy · 21h ago
Part of the reason the debate over usefulness is so exhausting is because there is an element of randomness, anecdote, and special pleading built into the use of the tool. One person has a good experience in their context, one person has a bad experience in their context. One person sees a bad result and throws up their hands and says it sucks, another tweaks their prompt until they get a good result and says it works.
In addition there's an element of personality: One person might think it's more meaningful to compose a party invitation to friends personally, another asks ChatGPT to write it to make sure they don't miss anything.
As someone who is on the skeptic side, I of course don't appreciate the "holding it wrong" discourse, which I think is rude and dismissive of people and their right to choose how they do things. But at the end of the day it's just a counterweight to the corresponding discourse on the skeptic side which is essentially "gross, you used AI". What's worse though is that institutions are forcing us to use this type of tool without consent, either through mandates or through integrations without opt-outs. To me it doesn't meet the bar for a revolutionary technology if it has to be propagated forcibly or with social scolding.
starkparker · 23h ago
> The process of coding with an “agentic” LLM appears to be the process of carefully distilling all the worst parts of code review, and removing and discarding all of its benefits.
This has been my experience as well, especially since the only space I have to work with agents are on C++ projects where they flat-out spiral into an increasingly dire loop of creating memory leaks, identifying the memory leaks they created, and then creating more memory leaks to fix them.
There are probably some fields or languages where these agents are more effective-I've had more luck with small tasks in JS and Python for sure. But I've burned a full work week trying and falling to get Claude to add some missing destructors to a working but leaky C++ project.
At one point I let it run fully unattended on the repo in a VM for four hours with the goal of adding a destructor to a class. It added nearly 2k broken LOC that couldn't compile because its first fix added an undeclared destructor, and its response to that failure was to do the same thing to _every class in the project_. Every time saying "I now know what the problem is" as it then created a new problem.
If LLMs could just lack confidence in themselves for even a moment and state that they don't know what the problem is but it's willing to throw spaghetti at the wall to find out, I could respect that more than it marching through with absolute confidence that its garbage code did anything but break the project.
bcrosby95 · 23h ago
I tried using them for a game server, written in C, as a test case and I came to the same conclusions. It did pretty well at making something that was buggy but worked, but increasingly continually fell apart around the 3k lines mark, which really isn't much code.
It's not like these are esoteric languages, so I'm not sure why they (in my experience) have such a big problem with them.
My current guess is that C and C++ aren't so mono domain. The languages its good at seems to be languages used predominantly in the web world, and most of the open code out there is for web.
I'm curious if anyone has tried to use it for C# in Unity or a bespoke game in Rust and how well it does there.
starkparker · 23h ago
I'm OK with an LLM not knowing how to do something.
The part of working with it that I can't get past is its full-throated confidence in its actions. It'll break down, step by step, why its actions would solve the problem, then attempts to build it and can't recognize how or why its changes made the problem worse.
crq-yml · 20h ago
I did spend a little while evaluating what GPT could output for a fairly intensive gaming task - implement a particular novel physical dynamics model. (Yes, there are newer tools now: I haven't used them and don't want to pay to use them.) And what it did was impressive in a certain sense in that it wrote up a large body of code with believable elements, but closer examination demonstrated that, of course, it wasn't actually going to function as-is, it made up some functions, it left others stubbed. I would never trust it to manage memory or other system resources correctly, because AFAIK nobody has implemented that kind of "simulate the program lifecycle" bookkeeping into the generated output.
There's a domain that I do find some application for it in, which is "configure ffmpeg". That and tasks within game engines that are similar in nature, where the problem to be solved involves API boilerplate, do benefit.
What also works, to some degree, is to ask it to generate code that follows a certain routine spec, like a state machine. Again, very boilerplate-driven stuff, and I expect an 80% solution and am pleasantly surprised if it gets everything.
And it is very willing to let me "define a DSL" on the fly and then act as a stochastic compiler, generating output patterns that are largely in expectation with the input. That is something I know I could explore further.
And I'm OK with this degree of utility. I think I can refine my use of it, and the tools themselves can also be refined. But there's a ceiling on it as a directly impactful thing, where the full story will play out over time as a communication tech, not a "generate code" tech, with the potential to better address Conway's law development issues.
deburo · 23h ago
It seems quite obvious as surely the amount of C++, C, etc. is less, in the public domain, than the millions of lines of JS, C#, Java, Python & others that are available on github, gitlab & others.
So for those users, you might still be finding LLMs lacking. It's also understandable that you might get more errors when generating C++ than say C#, since it's harder to use.
It reminds me that I recently changed companies in part because at my previous job, I didn't have as many opportunities to work with LLMs (doing the things they are good at) as my current one.
supriyo-biswas · 23h ago
> JS, C#, Java, Python
Not a comment on your primary points, but most LLMs out there also suck at enterprise Java projects (Spring boot, etc.) unless it's the most mundane things such as writing simple controllers.
groby_b · 22h ago
> add some missing destructors to a working but leaky C++ project.
That's amazing, given C++ provides default destructors. They don't "miss" unless you or the AI omitted them deliberately. So I'll guess you meant custom destructors that clean up manual resource management.
Which likely means you ran into the classic failure mode of AI, you underspecified constraints. It's C++. Humans or AI, as soon as there's more than one, be prepared to discuss ownership semantics in absolutely nauseating detail.
You don't fix broken ownership semantics by telling a junior "try to get it to compile" and walking away for a couple of hours, either.
And LLMs are interns, juniors at best. That means for a solo C++ project, yeah, you're probably better off not using them. The core issue being that they're incapable of actually learning, not the lack of skills per se - if you're willing to spec the hell out of everything every single time, they still do OK. It's soul-deadening, but it works.
> If LLMs could just lack confidence in themselves for even a moment and state that they don't know what the problem is but it's willing to throw spaghetti at the wall to find out, I could respect that more than it marching through with absolute confidence that its garbage code did anything but break the project.
As I said, juniors. This is a pretty common failure mode for fresh grads, too.
solarwindy · 18h ago
Juniors do learn though, by contrast (as you said yourself). So these models are, at best, perpetual juniors.
Except, they make bizarre mistakes that human juniors would not, and which are hard for even experienced developers to spot, because while it becomes possible with experience to preempt the kinds of mistakes that stem from the flawed or incomplete mental models of human juniors, these models do not themselves have a model of the computation underlying the code they produce—which in my experience makes the whole process of working with them endlessly frustrating.
Noumenon72 · 21h ago
> I found the experience of reading this commit history and imagining myself using such a tool — without exaggeration — nauseating.
The commit history[1] looks like a totally normal commit history on page 1, but I clicked farther down and found commits like
> Ask Claude to fix bug with token exchange callback when reusing refresh token.
> As explained in the readme, when we rotate the refresh token, we still allow the previous refresh token to be used again, in case the new token is lost due to an error. However, in the previous commit, we hadn't properly updated the wrapped key to handle this case.
> Claude also found that the behavior when `grantProps` was returend without `tokenProps` was confusing -- you'd expect the access token to have the new props, not the old. I agreed and had Claude update it.
It seems no different than "run prettier on the code" or "re-run code gen". I don't think it's fair to pick on this specific repo if your objection is just "I don't like reviewing code written by a tool that can't improve, regardless of what the commits look like". I'd call that "misanthropomorphizing".
This commentary so far seems to reflect around code. It really helps with brainstorming and I have used it to
- Plan my garden
- Identify plant issues
- Help with planting tips
- General research/better google
- Systems Engineering
- etc
Maybe the code generation isnt great but it is really good in a lot of areas.
asdff · 21h ago
There are ideas that emerge. One is the idea of gel mann amesia being the source of a lot of people complaining about genai code gen, yet people fall into the logical fallacy of trusting a misleading source of information they don't know much about even if they are a domain expert in another area and themselves demand higher quality information to conclude anything in that domain.
The other question is, would you have fared any worse without gen ai? Personally I find old school sources more comprehensive. I once picked up a book for free at my garden center, it contained all information about tomatoes including color photos of most every possible deficiency, pest, and disease. If something goes wrong with my tomatoes in a 30 second trip to my bookshelf I have access to high quality information. Meanwhile, what is the genAI trained on? Is my book in the training set? Maybe it is. But maybe its also polluted with the blogspam slop of the last 15 years on the internet, and that isn't easy to parse out without having a book like mine on hand, or better yet the horticultural training the authors behind that book I have on hand have had. Trusting my book is therefore trusting the highest quality output from the corpus of all humanity.
And it is like that with most other things. For most things you might encounter in life, there is a high quality source of information or book for low cost or free available. Why reach into the dark and pull out words from the sack when you can get the real deal so easily?
knowaveragejoe · 21h ago
I love examples of LLM utility outside of normal tasks people associate it with.
A recent example I had was fixing a toilet that broke. I don't know the name of the components within the tank, but I can understand what they do. I took a picture of the broken component and asked what it was - chatGPT was able to tell me, I was able to get a replacement, and that was that.
I have to remember I'm relatively spoiled in the sense that I know how to get that information out of Google. But for a less technically inclined person, I don't know how they'd even to do this themselves. Best I can think of is take a picture an ask a plumber(if you know one) or someone at a hardware store(if they themselves are even knowledgeable about this).
1. the hype is really insane so I get the fatigue, but the tech is insane as well, and the promise for the (long term) future is insane..
2. People mostly hate new things (human nature)
3. its fun watching nerds complain about AI after they spent the past 50 years automating other peoples careers.. turnabout is fair play? All the nerdsniping around GenAI and tech taking away from programming is funny.
4. Remember its a tool, not a replacement (yet), get used to it.. we're not supposed to be luddites. use it and optimize - its our job to convince managers and executives that we are masters over the tool and not vice versa
kentonv · 20h ago
The author goes into great detail about how he looked at my commit log[0] where I used AI, and he found it "nauseating" and concluded he'd never want to work that way.
I'm certainly not going to tell anyone that they're wrong if they try AI and don't like it! But this guy... did not try it? He looked at a commit log, tried to imagine what my experience was like, and then decided he didn't like that? And then he wrote about it?
Folks, it's really not that hard to actually try it. There is no learning curve. You just run the terminal app in your repo and you ask it to do things. Please, I beg you, before you go write walls of text about how much you hate the thing, actually try it, so that you actually have some idea what you're talking about.
Six months ago, I myself imagined that I would hate AI-assisted coding! Then I tried it. I found out a lot of things that surprised me, and it turns out I don't hate it as much as I thought.
He explained why he hasn't tried it in the section "The Challenge". Also see footnote 21, which says that what he called his "quirky constraints" in the main text are really bedrock ethical concerns. I respect him a great deal for not compromising on these.
Edit to add: Full disclosure: I've been a fan of his writing and work for a long time anyway, and am a supporter on Patreon.
kentonv · 19h ago
Yes, the ethical debate is fair and worth having, and I don't think someone necessarily has to try it to be able to debate the ethics. I obviously don't quite agree with his opinions here but I respect them.
(Though I would say, before I really tried it, I had believed that AI does a lot more "plagiarism" -- lifting of entire passages or snippets. Actually using it has made clearer to me that it really is more like someone who has learned by reading, and then applies those learnings, at least most of the time.)
I just don't like it when people argue that it doesn't work or doesn't do what they want without having actually tried it... There's too much echo chamber of people who don't want AI to work, and so convince each other that it doesn't, and repeat each other's arguments, all without ever taking a step outside their comfort zones. (Again... this was me, 6 months ago.)
glyph · 17h ago
I haven't personally tried the specific tool that you have, but I have tried a variety of other tools and have had pretty negative experiences with them. I have received a lot of feedback telling me that if I tried out an agentic tool (or a different model, or etc etc etc, as I covered in the post, the goal posts are endlessly moving) I would like it, because the workflow is different.
I was deliberately vague about my direct experiences because I didn't want anyone to do… well, basically this exact reply, "but you didn't try my preferred XYZ workflow, if you did, you'd like it".
What I saw reflected in your repo history was the same unpleasantness that I'd experienced previously, scaled up into a production workflow to be even more unpleasant than I would have predicted. I'd assumed that the "agentic" stuff I keep hearing about would have reduced this sort of "no you screwed up" back-and-forth. Made particularly jarring was that it was from someone for whom I have a lot of respect (I was a BIG fan of Sandstorm, and really appreciated the design aesthetic of Cap'n Proto, although I've never used it).
As a brutally ironic coda about the capacity of these tools for automated self-delusion at scale, I believed the line "Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.", and in the post, I accepted the premise that it worked. You're not a novice here, you're on a short list of folks with world-class appsec chops that I would select for a dream team in that area. And yet, as others pointed out to me post-publication, CVE-2025-4143 and CVE-2025-4144 call into question the efficacy of "thorough review" as a mechanism to spot the sort of common errors likely to be generated by this sort of workflow, that 0xabad1dea called out 4 years ago now: https://gist.github.com/0xabad1dea/be18e11beb2e12433d93475d7...
Having hand-crafted a few embarrassing CVEs myself with no help from an LLM, I want to be sure to contextualize the degree to which this is a "gotcha" that proves anything. The main thrust of the post is that it is grindingly tedious to conclusively prove anything at all in this field right now. And even experts make dumb mistakes, this is why the CVE system exists. But it does very little to disprove my general model of the likely consequences of scaled-up LLM use for coding, either.
kentonv · 16h ago
I do feel that the agentic thing is what made all the difference to me. The stuff I tried before that seemed pretty lame. Sorry, I know you were trying to avoid that exact comment, but it is true in my case. To be clear, I am not saying that I think you will like it. Many people don't, and that's fine. I am just saying that I didn't think I would like it, and I turned out wrong. So it might be worth trying.
The CVE is indeed embarrassing, particularly because the specific bug was on my list of things to check for... and somehow I didn't. I don't know what happened. And now it's undermining the whole story. Sigh.
glyph · 16h ago
I appreciate your commitment to being open to the possibility of being surprised. And I do wish I _could_ find a context in which I could be comfortable doing this type of personal experiment. But, I do remain confident in my own particular course of action chosen in the face of incomplete information.
Again, it's tough to talk about this while constantly emphasizing that the CVE at best a tiny little data point, not anywhere close to a confirmation bullseye, but my model of this process would account for it. And the way it accounts for it is in what I guess I need to coin a term for, "vigilance decay". Sort of like alert fatigue, except there are no alerts, or hedonic adaptation, for when you're not actually happy. You need to keep doing the same kinds of checks, over and over, at the same level of intensity forever to use one of these tools, and humans are super bad at that; so, at some point in your list, you developed the learned behavior "hey, this thing is actually getting most of this stuff right, I am going to be a little less careful". Resisting this is nigh impossible. The reason it's less of a problem with human code review is that as the human seems to be getting better at not making the mistakes you've spotted before, they actually are getting better at not making those mistakes, so your relaxed vigilance is warranted.
wagwang · 21h ago
Its hard to tell if genai is just hopelessly complicated or if there are people with really accurate takes that are drowned out by the noise.
energy123 · 22h ago
> I have woefully little experience with these tools. I’ve tried them out a little bit, and almost every single time the result has been a disaster that has not made me curious to push further. Yet, I keep hearing from all over the industry that I should.
Well this is the issue. The author assumed that these tools require no skill to use properly.
empath75 · 23h ago
There is a split between people who think of labor as an end in itself and people who think of a labor as a means of achieving some other end and people in the former category are going to always dislike any kind of automation.
Even though I've written a bunch of code, I don't really enjoy writing code. I enjoy building systems and solving problems. If I never have to write another line of code to do that, I couldn't be happier.
Other people have invested a lot of their time and their self image in becoming a good "computer programmer", and they are going to have a really hard time interacting with AI programmers or really any other kind of automation that removes the part of the job that they actually enjoy doing.
Really, it's not much difference between that and musicians that got mad at djs for just playing other people's music, who then got mad at djs that use ableton live instead of learning to beat match. Do you think the _process_ of making music is the important part, or the final sounds that are produced?
Just like DJ's and collage artists can be inventive and express ideas using work created by other people, people who use AI to code can still express creative ideas through how they use it and combine it and manipulate it.
jdefr89 · 23h ago
The issue is, though ChatGPT and the likes definitely speed up development in one respect, they create a bottleneck in various other places.. “vibe coders” have essentially no idea what the code their LLM cranked out even does and are unaware the code they generated is often riddled with bugs yet still looks plausible. Not only that, they are often trying to develop things, that for the most part you could have just googled. Of course an LlM can write your CRUD app to a pint but then try to add a novel aspect and you see it pretending to reason and make progress when in reality you’re just spinning your wheels. I do research at MIT, my day to day involves developing systems/solutions/technologies that did not previously exist. When the bulk of your work lies in novel things such as R&D the limits of LLMs become very apparent and they offer no further help.
Every single day friends of mine ( many who aren’t programmers just wanna be CEOs of the next trend) tell me about these “project” of theirs. At the beginning it amazed them. They got a GUI up displaying some numbers up in Python with just a few queries! So cool! To them this probably looks like black magic. As the project grows to anything beyond the trivial, they cannot push it further along. They can’t look at the code themselves to fix the minor issue any human would immediately see because they have no clue/haven’t read the bulk of generated code (often trash quality riddled with vulnerabilities)… They often resort to trying a million different tools and platforms, spinning in circles, praying another model can fix their abomination… You get the idea. Sorry for poor grammar typing on small phone on bus.
ofjcihen · 21h ago
I come from the same space and I agree wholeheartedly. My current work also has me working with APIs that aren’t public so no help there either.
This has me questioning the level of complexity most people on HN are working at day to day. For so many to be amazed at the ability to spontaneously generate CRUD apps that already exist in a thousand different iterations I’m guessing it’s pretty low.
tuckerman · 22h ago
I know they are very visible on Twitter and the like but how many people are truly "vibe coding" versus just being a SWE that uses an LLM to summarize documentation, scaffold some test cases, implement some tedious data transformations, and still implement the truly interesting bits on their own? I'm not sure how to actually figure that out but it would be very interesting to know.
I know IRL none of the former but am myself/know plenty of engineers who are the latter. It very well could be I'm the one with the biased sample but my experience is that I only hear about vibe coding from "influencers" and articles/comments decrying it.
throw-vibec0d3 · 21h ago
I made a quick throwaway to share my anecdote.
I'm at a YC company. We have a dev team < 15 engineers. We all expressed concerns about going full vibe coding, so the technical co-founder hired a bunch of vibe coders and has them on a private "SEAL team" building vibe coded monstrosities without any regard to safety, integrating with our main product, hosting consistency, or anything else.
I don't know if this is anyone else's experience. But I'm WAY more concerned about vibe coding than others seem to be just due to my proximity to the nonsense.
tuckerman · 21h ago
Sorry you have to deal with that and I hope they see the ramifications of their decision soon, but if this is your technical co-founder I'd recommend looking for a new role.
Can you share more about what the workflow of this new team looks like? I consider myself a strong proponent and heavy user of AI tools for development but it's still hard for me to fathom how true "vibe coding" at this scale is possible.
throw-vibec0d3 · 20h ago
My founders are certainly already in this thread, so I don't want to say a whole lot more. The more specific I get, the increased likelihood I get called in for a chat today for being "unwilling to adapt." But it's far worse than you're probably expecting from a company with good investors and revenue. I expect things will go terribly wrong very soon.
The PM just decided they didn't want to queue up tickets or designs anymore, so they're on this vibe coding team. We aren't even in interviews with devs going onto that team, though you can hardly call them devs (Maybe that's elitist of me. but that's how I'm feeling.)
And sorry this is turning into a therapy session, but all of us devs actually like AI tooling. We're getting decent to good value out of claude code and the like. We've implemented a fairly neat little MCP for our CS team to use as a walled-garden admin tool. And yet, since we criticize things, we're branded as "non-believers" and the implication is that we are impeding the company's growth.
Suffice it to say, all us career devs are already interviewing and on our way out here.
Nikolay_p · 21h ago
the founders think they can do it without any real developers, they will fail.
But you have a bigger problem, the tech co-founder (CTO) is an idiot, that company is not going anywhere.
kubb · 22h ago
> When the bulk of your work lies in novel things such as R&D the limits of LLMs become very apparent and they offer no further help.
Yes, but most people here do CRUD websites, so they’re extatic.
dmm · 2h ago
> Do you think the _process_ of making music is the important part, or the final sounds that are produced?
Pharmaceuticals are regulated not just by testing the end product but by regulating the entire production process[1]. How this came to be has an interesting story but I believe that it's essentially the correct way to do things.
Saying that process is irrelevant, that only the end product matters is a very persuasive idea but the problem comes when it's time to validate the end product. Any test you can perform on a black box can only show the presence or absence of a specific problem, never the absence of problems.
“Program testing can be used to show the presence of bugs, but never to show their absence!” - Dijkstra
[1] If you're interested in how this regulatory structure has failed to regulate international drug manufacturers, check out "Bottle of Lies" by Katherine Eban.
seadan83 · 23h ago
> Do you think the _process_ of making music is the important part, or the final sounds that are produced?
It is both, for different reasons. Not caring for the process is the path of the mid-wit.
thoroughburro · 20h ago
Ultimately, you’re arguing for generalism over specialism. Any specialist must be ignoring part of the larger process.
“All specialists are midwits” is obviously false.
Alternatively, even “most expert musicians are their own luthiers”… maybe more obviously false?
seadan83 · 16h ago
I'm interested for you to elaborate a bit more! I'd like to see more clearly the connection of my statements and the argument for generalist over specialist.
Overall, I think I'm arguing that you cannot become an expert without continually improving the process; and to do that you need to continually understand and study the process at progressively deeper levels. If a person cares nothing for the process, just the outcome - they therefore cannot achieve mastery (I'm like 85% convinced this is true).
If we assume a specialist is also an expert, then by definition specialists are not midwits - I agree.
An expert musician I would suspect should know how the different qualities of the instrument they play affect the outcome. Whether they can actually construct an instrument is a different question, but they ought to know the differences. Can you also be an expert musician and not know, nor care about the differences in instrument construction quality and material? Can you be an expert on a piano and not know what makes for a good piano?
I think the most exact analogy would be a musician who does not care about how an instrument is played - they don't study it, they don't care, just do something that makes for a nice sound, and that is all they desire. I don't think a person can achieve mastery in that manner, to achieve mastery - the 'how' does matter; the master should know everything about the different 'hows' and the trade-offs; and that is what makes them a master.
asdff · 21h ago
The main issue with AI I think is that it is exposing slop in the process people have handwaved away for some time. If the issue is you ask an AI for a prompt and it issues a function that doesn't work as expected, and this breaks your workflow, then the issue is your workflow. A junior dev or anyone else really could write a function that doesn't work as expected. Your workflow shouldn't go down to its knees in response, it should have tests and other debugging surrounding the putative function call to ensure that its output is as expected and it is not causing downstream issues in the stack.
This is why a stack overflow copy paster and a gen ai cruch wielder are both the same kind of bad. Both don't understand the context of the code hence grasping for "please solve this for me someone/thing" tooling instead of the underlying documentation that explains every thing for someone with knowledge to apply. Both require the same care to either not hire or to disempower such that one stupid bug pushed on friday isn't going to bring down the stack on the weekend, that it can be reviewed by people with an understanding of what is happening.
Game_Ender · 23h ago
I am not sure there is too much value for this article for the typical hacker news conversation on LLM based tooling. Here we generally focus on if the tooling is effective, and can it be used make software quicker or more cheaply. The problem is the author is opposed using the cutting edge models on privacy and ethics grounds. So they say:
> I have woefully little experience with these tools.
> I do not want to be using the cloud versions of these models with their potentially hideous energy demands; I’d like to use a local model. But there is obviously not a nicely composed way to use local models like this.
> The models and tools that people are raving about are the big, expensive, harmful ones. If I proved to myself yet again that a small model with bad tools was unpleasant to use, I wouldn’t really be addressing my opponents’ views.
Then without having any real practical experience with the cutting edge tooling they predict:
> As I have written about before, I believe the mania will end. There will then be a crash, and a “winter”. But, as I may not have stressed sufficiently, this crash will be the biggest of its kind — so big, that it is arguably not of a kind at all. The level of investment in these technologies is bananas and the possibility that the investors will recoup their investment seems close to zero.
I think a more accurate take is this will be like self driving, huge investments, many more losers thank winners, and it will take longer than all the boosters think. But in the end we did get actual self driving cars, but this time it's with LLMs it is something that anyone can use by clicking a link vs. waiting for lots of cars to be built and deployed.
recursive · 23h ago
> Here we generally focus on if the tooling is effective
That's one of the things we sometimes focus on.
> But in the end we did get actual self driving cars
Kind of. Barely. You still can't buy a car without a steering wheel. If they were self-driving, that would be a waste of space and resources.
malwrar · 19h ago
I’m near term AI-skeptical, but I think most of the capability & cost-related problems will be solved by improvements to current model architectures. My personal canary for the start of this will be when we start seeing successful architectures beyond (roughly) “tokens -> transformer blocks -> linear transformation to probability distribution”. I wanted to engage with a few of the author’s specific points of skepticism though (hope you don’t mind, if you’re reading this) since I’m not sure if I agree with them.
> Energy Usage
This has always seemed like a fake problem to me, and mostly just a way to rally environmentalists around protesting genai (and bitcoin and…). The costs (time & capital) associated with nuclear power are mostly regulatory; in the past decade China has routinely built numerous modern reactors while the west has been actively scaling nuclear power down. I have yet to see a good case for not simply scaling up power generation to meet demand. This is assuming that AI power demands remain constant and even require gigawatt-scale power sources to train & operate. For all we know, one paper from one researcher anywhere in the world could show us that current model architectures are orders of magnitude inefficient and make current-standard frontier model training doable in a basement.
> The Educational Impact
A few years ago I came across some cool computer vision videos on youtube and became inspired to try reading the papers and understanding more. The papers were surprisingly short (only a few dozen pages usually), but contained jargon and special symbols that I found impossible to google. I eventually read more papers and even some books, and eventually discovered that if I had simply read e.g. Hartley & Zisserman’s Multiple View Geometry (commonly just referenced as “MVG”, another mystery I had to figure out) I would have had the hidden knowledge necessary to understand much of the math. If I had simply read earlier papers, I’d be familiar with the fact that much of the content is just slight additions on previous works rather than fully greenfield ideas. I’m not rich enough or successful enough to access the formal education that would have probably provided me the resources to learn this. Youtube videos from practitioners in the field felt more like insecure nerds gatekeeping their field with obtuse lessons. It took me about a year to produce anything useful without just using opencv. When I tried asking the same questions I had as a beginner, chatgpt was able to answer everything I was confused on, answer my dumb followup questions, and even produce demo code so I could explore the concepts. That experience has made it an irreplaceable educational tool for me. Im sure more people will cheat, but college is already an overvalued competence signal anyways.
> The Invasion of Privacy
Temporary problem, someone will produce a usable hardware equivalent that makes it similarly easy to use genai w/o the privacy implications. This is a segment begging for someone to provide a solution, I think most people with the resources currently to solve it just want to be the AI gatekeeer and thus the service model for this tech.
> The Stealing
We won’t find common ground, I dont believe in ownership of information and think copyright is largely immoral.
> The Fatigue
This section seems to largely revolve around no clear obvious tooling route to simply using the tech (complete empathy there, Im currently building my own tooling around it), along with not having a clear mental model of how these models work. I had the same anxiety when chatgpt first blew my mind and just decided to learn how they work. They’re dirt fucking simple, shitty non-learned computer vision systems are orders of magnitude more complex. You seem like a cool person, hmu my profile name @ gmail if you want to talk about it! I like 3b1b’s videos if you want a shorter version, karpathy is overhyped imo but also cool if you want to build a toy model line by line,
vouaobrasil · 23h ago
> the model could be created with ethically, legally, voluntarily sourced training data
There is no such thing as ethical AI, because "voluntary" usually means voluntary without the participants really understanding what they are making, which is just another tool in the arms race of increasingly sophisticated AI models – which will largely be needed just to "one up" the other guy.
"Ethical" AI is like forced pit-fighting where we ask if we can find willing volunteers to fight for the death for a chance for their freedom. It's sickening.
mjburgess · 21h ago
I can at least say why offering a "theory of AI" in the dective sense is very difficult: AI development is an anti-inductive process.
Anti-inductive processes are those, like e.g., fraud, which change because you have measured them. Once a certain sort of fraud is made illegal and difficult, most move on to a different kind.
AI models are at-base memorisers, with a little structured generalisation around the region of memory. They are not structure learners who happen to memorize. This makes creating a theory of an AI model kinda impossible because the "representations" they learn are so-called "entangled", another way of saying: garbage. The AI model's representation of language does not match language structure, rather each representation is a mixture of what we can evidently see as several competing structural features. But the model has no access to this structure, because its literally not a property of the data but of the data generating process.
Now this seems like "a good theory" of AI in the monkish sense, but a problem arises: for each fragile boundary that benchmarking and testing shows, the AI companies collect data on these failures and retrain the model. This is, in a sense, anti-inductive fraud: companies seek to find out how they are found out, and extend the model to cover those regions.
But models never actually gain the structural capabilities their behaviour is taken to imply. So, of course, this would drive any theorist insane. By the time you've figured out gpt3.5's fakery, all the openai chatbot data theyve collected has made gpt4 -- and all paper has been covered over all the edges. Until your fingers are cut next time on the now dark fragile boundary of openai's deception.
I can give very good reasons, independent of any particular AI model why: the manifold hypothesis is wrong, AI models are memorizers + structure, why they dont find latent structure but combine statistical heurstics, why their representations are not "entangled" such that they can be "disentangled" but necessarily model-unknowable blends of expert-known scientific joints -- and so on.
But none of this knowledge can become useful in the face of a billion-dollar training company competing against me, anti-inductively, to apply any of this to any given model.
Perhaps then, I suppose, we can look at the behaviour these systems induce in their users and its downstream effects. This is what OP here does, throws hands up at a theory and says only: however this is working, it cannot be good.
This is, of course, what we do with fraud, cults, and many other systems of deception. We say: we're not going to argue with the conspiracy theoriest, they are specialists at coming up with ever more elbobate adaptions to the self-deception. Instead, we observe by proxy, that they behave pathological;y -- that they are broken in otehr ways.
I think that's a fair approach, and at least one that allows the author to proceed without a theory.
triceratops · 22h ago
GenAI may not be done thinking about you though.
(Sorry for the low-effort response. But it's kinda true)
Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.
But, at least in that scenario, the thing ultimately doesn’t work, so there’s a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down."
The counterpoint to this is that _SOME_ people are able to achieve force multiplication (even at the highest levels of skill, it's not just a juniors-only phenomenon), and _THAT_ is what is driving management adoption mandates. They see that 2-4x increases in productivity are possible under the correct circumstances, and they're basically passing down mandates for the rank and file to get with the program and figure out how to reproduce those results, or find another job.
AI has the most effect for people with less experience or low performance. It has less of an effect for people on the high end. It is indeed closing the skill gap and it does so by elevating the lower side of it.
This is important to know because it helps explain why people react as they do. Those who feel the most lift will be vocal about AI being good while those that don't are confused by anyone thinking AI is helpful at all.
It is not common for people on the high skill side to experience a big lift except for when they use AI for the tedious stuff that they don't really want to do. This is a sweetspot because all of the competence is there, but the willingness to do the work is not.
I have heard Dr Lilach Mollick, dir of Pedagogy at Wharton, say this has been shown numerous times. People who follow her husband, Ethan, are probably aware already.
That's my "criticism", it's not closing the skill gap. Your skills haven't change, your output has. If you're using AI conservatively I'd say you're right, it can remove all the tedious work, which is great, but you'll still need to check that it's correct.
I'm more and more coming to the idea that for e.g. some coding jobs, CoPilot, Claude, whatever can be pretty helpful. I don't need to write a generic API call, handle all the error codes and hook up messages to the user, the robot can do that. I'll check and validate the code anyway.
Where I'm still not convinced is for communicating with other humans. Writing is hard, communication is worse. If you still have basic spelling errors, after decades of using a spellchecker, I doubt that your ability to communicate clearly will change even with an LLM helping you.
Same with arts. If you can't draw, no amount of prompting is going to change that. Which is fine, if you only care about the output, but you still don't have the skills.
My concern is the uncritical application of LLMs to all aspects of peoples daily life. If you can use an LLM to do your job faster, fine. If you can't do it without an LLM, you shouldn't be doing it with one.
I'd be curious to see the sources.
Basically every study I have ever read making some claim about programming (efficacy of IDEs, TDD, static typing, pair programming, formal CS education, ai assistants, etc...) has been a house of cards that falls apart with even modest prodding. They are usually premised on one or more inherently flawed metrics like number of github issues, LoC or whatever. That would be somewhat forgivable since there are not really any good metrics to go on, but then the studies invariably make only a perfunctory effort to disentangle even the most obvious of confounding variables, making all the results kind of pointless.
Would be happy if anyone here knew of good papers that would change my mind on this point.
Isn't this true about most things in software? I mean is there anything quantifiable about "microservices" vs "monolith"? Test-driven development, containers, whatever?
I mean all of these things are in some way good, in some contexts, but it seems impossible to quantify benefits of any of them. I'm a believer that most decisions made in software are somewhat arbitrary, driven by trends and popularity and it seems like little effort is expended to come to overarching, data-backed results. When they are, they're rare and, like you said, fall apart under investigation or scrutiny.
I've always found this strange about software in general. Even every time there's a "we rewrote $THING in $NEW_LANG" and it improved memory use/speed/latency whatever, there's a chorus of (totally legitimate) criticism and inquiry about how things were measured, what attempts were made to optimize the original solutions, if changes were made along the way outside of the language choice that impacted performance etc etc.
To be clear I am not arguing that tools and practices like TDD, microservices, ai assistants, and so on have no effect. They almost certainly have an effect (good or bad).
It’s just the unfortunate reality that quantitatively measuring these effects in a meaningful way seems to basically be impossible (or at least I’ve never see it done). With enough resources I can’t think of any reason it shouldn’t be possible, but apparently those resources are not available because there are no good studies on these topics.
Thus my skepticism of the “studies” referenced earlier in the thread.
I actually think that it benefits high performance workers as AI can do a lot of heavy lifting that frees them to focus on things where their skills make a difference.
Also, for less skilled or less experienced developers, they will have a harder time spotting the mistakes and inconsistencies generated by AI. This can actually become a productivity sink.
at least, this is what i typically end up with.
It generally generates defective code, but it doesn't really matter all that much, it is still useful that it is mostly right, and I only need to make a few adjustments. It saves me a lot of typing.
Would I pay for it? Probably not. But it is included in my IntelliJ subscription, so why not? It is there already.
I'll bite anyhow.
AI is very, very good at working at short length-scales. It tends to be worse at working at longer length-scales (Gemini is a bit of an outlier here but even so, it holds). People who are hyper-competent/elite-skill in their domain who achieve force multiplication with gen-AI understand this, and know how to decompose challenging long length-scale problems into a number of smaller short-length scale problems efficiently. This isomorphic transform allows AI to tackle the original problem in a way that it's maximally efficient at, thus side-stepping their inherent weaknesses.
You can think of this sort of like mathematical transformations that make data analysis easier.
I break the pro-AI crowd into 3 main categories and 2 sub categories:
1. those who don't really know how to code, but AI lets them output something more than what they could do on their own. This seems to be what the GP is focused on
2. The ones who can code but are financially invested to hype up the bubble. Fairly self explanatory; the market is rough and if you're getting paid the big bucks to evangelize, it's clear where the interests lie.
3. Executives and product teams that have no actual engagement with AI, but know bringing it up excites investors. a hybrid of 1 and 2, but they aren't necessarily pretending they use it themselves. It's the latest means to an end (the end being money).
and then the smaller sects:
1. those who genuinely feel AI is the future and is simply prepping for it and trying to adapt their workflow and knowledged based around it. They may feel it can already replace people, or may feel it's a while out but progressing that way. These are probable the most honest party, but I personally feel they miss a critical aspect: what is used currently as the backbone for AI may radically change by the time it is truly viable.
2. those who are across the spectrum of AI, but see it as a means to properly address the issue of copyright. If AI wins, they get their true goal of being able to yoink many more properties without regulations to worry about.
>People who are hyper-competent/elite-skill in their domain who achieve force multiplication with gen-AI understand this,
are their real examples of this? The main issue I see is that people seem to judge "competency" based on speed and output. But not on the quality, maintainability, nor conciseness of such output. If we just needed engineers to slap together something that "works", we could be "more productive".
I was already a very high performer before AI, leading teams, aligning product vision and technical capabilities, architecting systems and implementing at top-of-stack velocity. I have been involved in engineering around AI/ML since 2008, so I have pretty good understanding of the complexities/inconsistencies of model behavior. When I observed the ability of GPT3.5 to often generate working (if poorly written, in general) code, I knew this was a powerful tool that would eventually totally reshape development once it matured, but that I had to understand its capabilities and non-uniform expertise boundary to take advantage of its strengths without having to suffer its weaknesses. I basically threw myself fully into mastering the "art" of using LLMs, both in terms of prompting and knowing when/how to use them, and while I saw immediate gains, it wasn't until Gemini Pro 2.5 that I saw the capabilities in place for a fully agentic workflow. I've been actively polishing my agentic workflow since Gemini 2.5's release, and now I'm at the point where I write less than 10% of my own code. Overall my hand written code is still significantly "neater/tighter" than that produced by LLMs, but I'm ok with the LLM nailing the high level patterns I outline and being "good enough" (which I encourage via detailed system prompts and enforce via code review, though I often have AI rewrite its own code given my feedback rather than manually edit it).
I liken it to assembly devs who could crush the compiler in performance (not as much of a thing now in general, but it used to be), who still choose to write most of the system in c/c++ and only implement the really hot loops in assembly because that's just the most efficient way to work.
Indeed, he did not list "out-of-touch suit-at-heart tech leads that failed upwards and have misplaced managerial ambitions" as a category, but that category certainly exists, and it drives me insane.
You might find your professional development would stop being retarded if you got the chip off your shoulder and focused on delivering maximum value to the organizations that employ you in any way possible.
Let's say "n" is the sum complexity of a system. While some developers can take an approach that yields a development output of: (1.5 * log n), the AI tools might have a development output of: (4 * log n)^4/n. That is, initially more & faster, but eventually a lot less and slower.
The parable of the soviet beef farmer comes to mind: In this parable, the USSR mandated its beef farmers increase beef output YoY by 20%, every year. The first year, the heroic farmer improved the health of their livestock, bought a few extra cows and hit their target. The next year, to achieve 20% YoY, the farmer cuts every corner and maximizes every efficiency, they even exchange all their possessions to buy some black market cows. The third year, the farmer can't make the 20% increase, they slaughter almost all of their herd. The fourth year, the farmer has essentially no herd, they can't come close to their last years output - let alone increase it. So far short of quota, the heroic beef farmer then shot himself.
(side-note: Which is also analagous to people not raising their skill levels too, but not my main point - I'm more thinking about how development slows down relative to the complexity and size of a software system. The 'not-increasing skills' angle is arguably there too. The main point is short term trade-offs to achieve goals rather than selecting long term and sustainable targets, and the relationship of those decisions to a blind demand to increase output)
So, instead of working on the insulation of the home, instead of upgrading the heating system, to heat the home faster we burn the furniture. It works.. to a point. Like, what happens when you run out of furniture, or the house catches fire? Seemingly that will be a problem for Q2 of next year, for now, we are moving faster!!
I think this ties into the programming industry quite heavily from the perspective where managers often want things to work just long enough for them to be promoted. Doesn't have to work well for years, doesn't have to have the support tools needed for that, nope - just long enough that they can get the quarterly reward and then move on to not worry about the support mess left behind. To boot too, the feedback cycle for whether something was a good idea in software or not is slow, oftentimes years. AI tools have not been out for a long time, just a couple years themselves, it'll be another few before we see what happens when a system is grown to 5M lines through mostly AI tooling and the codebase itself is 10 years old - will that system be too brittle to update?
FWIW, I'm of the point of view that quality, time and cost are not an iron triangle - it is not a choose two situation. Instead, quality is a requirement for low cost and low time. You cannot move quickly when quality is low (from my experience, the slowdown of low quality can manifest quickly too - on the order of hours. A shortcut taken now can reduce velocity just even later that same day).
Thus, mandates from management to move 2x to 4x faster, when it's not clear that AI tools actually deliver 2x to 4x benefits over the longer term (perhaps not even in the shorter term), feels a lot like the soviet beef farmer parable, or burning furniture to stay warm.
My experience so far is that if you architect your systems properly AI continues to scale very well with code base size. It's worth noting that the architecture to support sustained AI velocity improvement may not be the architecture that some human architects have previously grown comfortable with as their optimal architecture for human productivity in their organization. This is part of the learning curve of the tools IMO.
> If your AI scaling statement is accurate then the problem will eventually solve itself as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
I find one of the biggest differences between junior engineers and seniors is they think differently about how complexity scales. Juniors don't think about it as much and do very well in small codebases where everything is quick. They do less well when the complexity grows and sometimes the codebase just simply falls over.
It's like billiards. A junior just tries to make the most straight forward shot and get a ball in the hole. A senior does the same, but they think about where they will leave the cue ball for the next shot, and they take the shot that leaves them in a good position to make the next one.
I don't see AI as being able to possess the skills that a senior would have to say "no, this previous pattern is no longer the way we to do things because it has stopped scaling well. We need to move all of these hardcoded values to database now and then approach the problem that way." AFAIK, AI is not capable of that at all, it's not capable of a key skill of a senior engineer. Thus, it can't build a system that scales well with respect to complexity because it is not forward thinking.
I'll posit as well that knowing how to change a system so that it scales better is an emergent property. It's impossible to do that architecture up front, therefore an AI must be able to say "gee, this is not going well anymore- we need to switch this up from hardcoded variables to a database - NOW; before we implement anything else." I don't know of any AI that is capable of that. I could agree that when that point is reached, and a human starts prompting on how to refactor the system (which is a sign the complexity was not managed well) - then it's possible to reduce the interest cost of outsized complexity by then using an AI to start managing the AI induced complexity...
You're assuming organizations are operating with the goal of quality and velocity in mind. We saw that WFH made people more productive, and had higher quality of life. companies are still trying to enforce RTO as we speak. The productivity was deemed not worth it compared to other factors like real estate, management ego, and punishing the few who abused the priveledge.
We're in weird times and sadly many companies have mature tech by now. They can afford to lose productivity if it helps make number go up.
All things being equal, I would agree. Things are not equal though. The slow down can manifest as: needing more developers for the same productivity, lots of new projects to do things like "break the AI monolith into microservices", all the things that a company needs to do when growing from 50 employees to 200 employees. Having a magicly different architecture is kinda just a different reality, too much chaos to always say that one approach would really be different. One thing though, it does often take 2 to 5 years before knowing whether the chosen approach was 'bad' or not (and why).
Companies that are trying to scale - almost no two are alike. So it'll be difficult to do a peer-to-peer comparison, it won't be apples to apples (and if so, the sample size is absurdly small). Did architecture kill a company, or bad team cohesion? Did good team cohesion save the company despite bad architecture? Did AI slop wind up slowing things down so much that the company couldn't grow revenue? Very hard to make peer-to-peer comparisons when the problem space is so complex and chaotic.
It's also amazing what people and companies can do with just sheer stubbornness. Facebook has over (I hear) 1000+ engineers just for their mobile app.
> My experience so far is that if you architect your systems properly AI continues to scale very well with code base size. It's worth noting that the architecture to support sustained AI velocity improvement may not be the architecture that some human architects have previously grown comfortable with as their optimal architecture for human productivity in their organization
I fear this is the start of a no-true-scotsman argument. That aside, what is the largest code base size you have reached so far? Would you mind providing some/any insight into the architecture differences for an AI-first codebase? Are there any articles or blog posts that I could read? I'm very interested to learn more where certain good architectures are not good for AI tooling.
AI likes modular function grammars with consistent syntax and interfaces. In practice this means you want a monolithic service architecture or a thin function-as-a-service architecture with a monolithic imported function library. Microservices should be avoided if at all possible.
The goal there is to enable straightforward static analysis and dependency extraction. With all relevant functions and dependencies defined in a single codebase or importable module, you can reliably parse the code and determine exactly which parts need to be included in context for reasoning or code generation. LLMs are bad at reasoning across service boundaries, and even if you have OpenAPI definitions the language shift tends to confuse them (and I think they're just less well trained on OpenAPI specs than other common languages).
Additionally, to use LLMs for debugging you want to have access to a single logging stream, where they can see the original sources of the logging statements in context. If engineers have to collect logs from multiple locations and load them into context manually, and go repo hopping to find the places in the code emitting those logging statements, it kills iteration speed.
Finally, LLMs _LOVE_ good documentation even more than humans, because the humans usually have the advantage of having business/domain context from real world interactions and can use that to sort of contextually fumble their way through to an understanding of code, but AI doesn't have that, so that stuff needs to be made as explicit in the code as possible.
The largest individual repo under my purview currently is around 250k LoC, my experience (with Gemini at least) is that you can load up to about 10k LoC functionally into a model at a time, which should _USUALLY_ be enough to let you work even in huge repos, as long as you pre-summarize the various folders across the repo (I like to put a README.md in every non-trivial folder in a repo for this purpose). If you're writing pure, functional code as much as possible you can use signatures and summary docs for large swathes of the repo, combined with parsed code dependencies for stuff actively being worked on, and instruct the model to request to get full source for modules as needed, and it's actually pretty good about it.
I worked at a Indian IT services firm which even until mid-2010's didn't give internet access to people at work. Their argument was that use of internet would make the developers dumb.
The argument always was assume you had internet outage for days and had to code , you would value your skills then. Well guess what its been decades now, and I don't think that situation has ever come to pass, heck not even something close to that has come to pass.
Sometimes how you do things changes. During the peak of Perl craze, my team lead often told me people who didn't use C++ weren't as smart and eventually people who used Perl would have their thinking skills atrophy when Perl wouldn't be around. That doomsday scenarios hasn't happened either. People have similar things about Java, IDEs, package managers, docker etc etc.
Businesses don't even care about these things. A real estate developer wants to sell homes, their job is not to build web sites. So as long as working site is available, they don't care who builds it, you or AI. Make whatever of this you will.
I've seen this firsthand multiple times: people who really don't want it to work will (unconsciously or not) sabotage themselves by writing vague prompts or withholding context/tips they'd naturally give a human colleague.
Then when the LLM inevitably fails, they get their "gotcha!" moment.
I've been playing with language models for seven years now. I've even trained them from scratch. I'm playing with aider and I use the chats.
I give them lots of context and ask specific questions about things I know. They always get things wrong in subtle ways that make me not trust them for things I don't know. Sometimes they can point me to real documentation.
gemma3:4b on my laptop with aider can merge a diff in about twenty minutes of 4070 GPU time. incredible technology. truly groundbreaking.
call me in ten years if they figure out how to scale these things without just adding 10x compute for each 1x improvement.
I mean hell, the big improvements over the last year aren't even to do with learning. Agents are just systems code. RAG is better prompting. System prompts are just added context. call me when GPT 5 drops, and isn't an incremental improvement
Code =/= Product should be kept in mind. That said, I do not have a hard position on the topic, though am certain about detrimental and generational skill atrophy.
A lot of the reluctance to bulk adoption is that it seems to drag quality down. CEOs don't usually see that until it's far too late though.
This is a buzzword, this isn't a thing. Just like the 10x developer was never a thing.
AI as a force multiplier is also a thing, with well structured codebases one high level dev can guide 2-3 agents through implementing features simultaneously, and each of those agents is going to be outputting code faster than your average human. The human just needs to provide high level guidance on how the features are implemented, and coach the AI on how to get unstuck when they inevitably run into things they're unable to handle.
The mental model you want for force multiplication in this context is a human lead "managing" a team of AI developers.
Exactly. And if you consider AI to be the inevitable source of unprecedented productivity gains then this filtering of employees by success with/enthusiasm for AI makes sense.
While I understand the fear, I don’t really share it. And if I where to go to the root of it, I think I really most take issue with this:
> My experiences of genAI are all extremely bad, but that is barely even anecdata. Their experiences are neutral-to-positive. Little scientific data exists. How to resolve this?
My experience is astonishingly positive. I would not have imagined how much of a help these tools have become. Deep research and similar tools alone have helped me navigate complex legal matters recently for my incorporation, they have uncovered really useful information that I just would not have found that quickly. First cursor, now Claude Code have really changed how I work. Especially since for the last month or so, I feel myself more and more in the position where I can do things while the machine works. It’s truly liberating and it gives me a lot of joy. So it’s not “neutral-to-positive” to me, it’s exhilarating.
And that extends particularly to this part:
> Despite this plethora of negative experiences, executives are aggressively mandating the use of AI. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.
When I was at Sentry the adoption of AI happened by ICs before the company even put money behind it. In fact my memory is that if anything only at the point where an exceeding number of AI invoices showed up from IC expenses did we realize how widespread adoption has been. This was grounds up. For my non techy friends it’s even tricker because some of them work in companies that outright try to prevent the adoption of AI, but they are paying for it themselves to help them with the work. Some of them pay for the expensive ChatGPT package even! None of this should be disregarded, but it stands in a crass contrast to what this post says.
That said, I understand where Glyph comes from and I appreciate that point. There is probably a lot of truth in the tail risks of all of that, and I share these. It just does not at all take away from my enjoyment and optimism at all.
I find this workflow makes my life extremely tedious, because reviewing and fixing whatever the machine produces is a slog for me
I suppose it would be exhilarating if I just trusted the output but somehow I just can't bring myself to do that
How do you reconcile this with your own work? Maybe you just skim the output? Or do you run it and test that way?
Please don't tell me you just trust the AI to write automated tests or something...
Check out several copies of the repo and work on different branches concurrently.
> I find this workflow makes my life extremely tedious, because reviewing and fixing whatever the machine produces is a slog for me
This is a matter of personal preference, but reviewing code (from humans) was already a huge chunk of my job and one that I enjoy. Now I can have a similar workflow for side projects, especially for things that I find less enjoyment in coding myself.
> Please don't tell me you just trust the AI to write automated tests or something...
Like most tools, the more I use LLMs for coding the more I get a feel for what its good at, what its okay at, and what it will just make a mess of. I find for changes like extending existing test coverage before refactoring, I can often just skim the output. For writing tests from whole cloth it requires a lot more attention (depending on the area). For non-test code, it very much depends on how novel the work is and how much context/many examples it has to go on.
I don't actually mind the code review process, it's what I'm used to with open source contributions. Not infrequently did I look at a PR that people left me and decide to rewrite it entirely anyways because I did not quite appreciate the method they chose. That however did not mean the PR was useless, it might have provided the necessary input that made me go down that path ultimately.
I also think that what can make code review enjoyable is not necessarily because the person on the other side learns something from it. I actually tend to think that this something that does not necessarily happen in PR review, but in 1:1 conversations in person or though more synchronous collaboration.
So really my answer is that I do not mind code review. That said there is an important part here:
> How do you reconcile this with your own work? Maybe you just skim the output? Or do you run it and test that way?
What makes agentic workflows work is good tooling and fast iteration cyclces. That is also exactly what a human needs for a PR review to be actually thorough. I cannot accept a change when I do not know that it exhibits the intended behavior, and to validate that behavior I usually need a test case. Do I need to run the test case myself? Not necessarily because the agent / CI already does for me. However I do sometimes run it with verbose output to also see the intermediate outputs that might otherwise be hidden from me to give me more confidence.
It really is hard to describe, but when you put the necessary effort in to make an agentic workflow work, it's really enjoyable. But it does require some upfront investment for it to work.
> Please don't tell me you just trust the AI to write automated tests or something...
I do in fact trust the AI a lot to write tests. They have become incredibly good at taking certain requirements definitions and derive something that can be mostly trusted. It does not always work, but you get a pretty good feeling for if it will or not. At the end of the day many of the tasks we do are basic CRUD or data transformation tasks.
I wish it wasn't hard to describe because every attempt I've made to reach that point with agents/tools-in-a-loop has ended with more frustration, more errors, and more broken code than when I started, even with common and small, tedious tasks. I'd very much like to understand what proponents are doing and seeing that works.
it's funny because what i'd also like to see is people who are skeptics make a video as well since sometimes i also have the opposite suspicion. i get a lot of the criticism, but i don't get the "it produces pure garbage" type ones.
This leads to mistakes that are difficult to catch, because while I know the kinds of mistakes another human might make, having been through the process of learning not to make them myself, LLMs produce whole classes of bizarre mistakes that I have no interest in learning to catch. There is no discernible flawed mental model behind these errors—which with a human could be discussed and corrected—just an opaque stochastic process which I can tediously try to set on a better course with ‘incantations’, attempting to dial in to a better part of the training data that avoids the relevant class of error.
It’s honestly amusing how much of ‘prompt engineering’ (if it can be dignified with that term) amounts to a modern-day kind of mysticism. What better can we really do though when these models’ structure and operation is utterly opaque, on the one hand through deliberate, commercially-oriented obfuscation, and on the other because we still just cannot explain how a multi-billion parameter model works.
It’s rewarding to work with human juniors because people actually learn and improve. My learning how to coax these models into producing better than trash just is not, especially on anything either remotely novel, or in a legacy codebase that requires genuine understanding of how an existing system functions at runtime.
Once those two ends of the spectrum are ruled out, I find there’s little left that an LLM can accomplish, without necessitating a loop of prompt refinement that leaves me feeling like a worse developer for not just having thought through the problem myself, and resentful of the time wasted.
Edit: this entire thread about coercing Gemini into behaving is exactly the kind of crap I have zero interest in: https://news.ycombinator.com/item?id=44194061
i usually don't let it go run off on its own unless it's a very defined task that i can review quickly later, i just review and approve every change and it takes big cognitive load off for me. at some point maybe this doesn't feel like "programming", but then i'll just tweak something else or modify it and then go onto the next review. i find i can't have it produce the entire thing and then review it since i have no idea how it got to where it did or takes just as much time to understand. but doing it this way it's faster + i gain understanding.
the prompts aren't overly complex or take a lot of time, certainly way less than speccing something out for a junior. all i have a is a base file for style and structure and then i describe the general problem and reference files ad-hoc. where i find it actually fails a lot is in novel code because it has nothing to ground it and starts exploring random stuff. i only use it for novel exploration to see what approaches it comes up with.
still trying to understand why there's this huge chasm between the two viewpoints. like a lot of the things you just said i can't resonate at all with. like maybe the 20% i feel like i'm "fighting" the LLM i just stop and go in myself. does that suck? sort of, but it's certainly way less tedious than directing some other person to do it or the time saved had i not used it at all.
edit: but to your point, yeah it really is just like magic with no way to like actually direct it in a way you would where a human would learn. maybe over a year ago i tried and wrote anything AI off beyond basic co-pilot completions (same issues, "fighting" the AI, having to specify a tons of exceptions in some god awful file). the new agents changed everything for me, esp claude code. i think it will only get better, so it's best to pick it up.
my only fears are
1) no juniors being trained, thus no future seniors. part of the power is that you have experienced people using it to enhance their context or understanding. for those with no experience and no drive to "improve" (honestly, think of the avg dev at big co) or straight up "vibe coding" i shudder at the output.
my hypothesis we are now going to enter a period where a LOT of shitty code is going to be created. it's already happening in education with people just cheating w/o learning. i already had issues trying to hire people who were using AI to get past initial exercises but failing on complex issues because they were just probably copy and pasting everything. best time to be a nimble startup.
2) top-down mandates to use this stuff. you should only use it when you want and if it helps you. i think there's this element of companies buying into the hype 110% and that puts a bad taste in everyone's mouth. "all devs replaced in 1 year!" type stuff.
In legacy that I don’t yet understand, maybe I am missing something in using a model with all that code in its context as an aid to build my understanding. I just cannot imagine handing off responsibility of modifying that code to the mystery machine. Tedious as it might be, I’m firmly of the view that I’d do myself and my team a disservice if I introduce changes I don’t fully comprehend, in fear that it’d only bite us later.
That fear comes from a couple cases of being bitten by changes made by other supposed seniors where I had my suspicions they let the LLM do the work for them, and went against my judgement in accepting their assurances that they’d tested everything worked.
Although, OK, for the legacy case, perhaps I should loosen my embargo in legacy frontend code where there’s a tight enough ‘blast radius’—meaning, the damage of a bad change is constrained to the bit of UI in question not working. Especially if that’s back-office frontend code where I have responsive users who I know will let me know if I broke something (because they surely know the warts of the system inside out, unlike me). In mission-critical backend code? Not a chance.
On the question of ‘fighting’ the LLM, maybe I do need to loosen up in how I want something done. In fairness I’m much more tolerant of that in a human, because a) there’s just often more than one way to do things; and b) with some devs I know there would or could be a fight that just isn’t worth it.
Which does come to an interesting point about ego and ownership, that if I regard the LLM code less as ‘my own’ perhaps I’d be more forgiving of its contributions. Would honestly make a difference if it’s not got my name on the commit.
Also comes back to one of the cases where I got burned. If the other dev hadn’t put their name on code which I’m sure was not theirs, then we could have had an honest discussion about it, and I could have better helped figure out whether what seemed to work was really trustworthy. Instead I had to weigh challenging them on it, and the awkward implication that I didn’t think they could have come up with the code themself.
To your two fears, totally agree, and undoubtedly some variant on these two cases has put a very bad taste in my mouth. Witnessing a junior essentially cheat their way through their final project for school, making a mockery of the piece of paper they got at the end. Being a victim not quite of a top-down mandate—somehow worse, with an exec head-over-heels bought into the hype, thinking they could lose a chunk of expensive headcount to no ill effect; not firing people, just making the situation miserable enough that people quit.
Whether or not AI performs well is influenced both by the work you're doing and how experienced you are in it. AI performs better when the work is closer to mainstream work than novel work. It also performs better with lower level instructions, eg. being more specific. As for experience, there are two things: people with less experience get a bigger lift than those with lots of experience, or people wiht lots of experience get a lift by having AI doing the work they don't want to do, which is often unittests and comments or writing the bobdyllionth api endpoint.
I read your post the other day and appreciated the optimism, but I wasn't able to work out what kind of work you've been doing since leaving Sentry.
Those are typical dysfunctions in larger companies w/ weak leadership. They magnified by a few factors: AI is indistinguishable from magic for non tech leadership, demos that can be slapped quickly but that don't actually work w/o actual investments (which was what leadership wanted to avoid in the first place), and ofc the promise to reduce costs.
This happens in parallel to people using it to do their own work in a more bottom up manner. My anecdotal observation is that it is overused for low-value/high visibility work. E.g. it replaces "writing as proof of work" by reducing the cost to write bland, low information, documents used by middle management, which increases bureaucratic load.
On the flipside, it will make it even easier for corporations to use the legal system in their favour and corporations will more easily and effectively use GenAI against individuals even if both have it due to the nature of corporations and their ability to invest in better tools than the individual.
So it's just an arms race/prisoner's dilemma and while it provides incremental advantages, it makes everything worse because the system becomes more complex and more oppressive.
I don't understand these people that are saying they are getting huge value from LLMs. I haven't put a ton of effort into figuring out how to "hold it right" because stuff is changing so fast still. I've had bad experiences with rabbit holes like this where the true believers will tell me the pot of gold is always right around the next corner. If and when I can get some positive value out it using whatever's easily available on the default, then I might investigate deeper.
My experience mostly consists of GPT and Copilot, as provided by my employer. I'm part of a pilot program to evaluate Copilot. My feedback is that it's slightly positive in agggregate, but individual results are very mixed. It's not worth much to me.
It can't be both. It can't be 2-4x multiplier _and_ be wasting your time to the extent that you have to "get a feel for when it has been wasting your time".
> There is probably a lot of truth in the tail risks of all of that, and I share these. It just does not at all take away from my enjoyment and optimism at all.
On the one hand, I get not expanding on things you don't feel you can contribute to meaningfully and respect that.
On the other, isn't this a problem?
While I agree with you and cannot fully understand the author's conclusion that it's all bad (I've also found value), I wrestle with the downsides on an ongoing basis and have found myself shying away from using these tools as a result, even though I know they are quite capable of some things. I think that if I only responded "Well, I personally find the tools valuable", that this is only a part of the picture and is a net negative to the overall conversation because the "real" conversation that needs to be had IMO is about those harms, not haggling over whether or not the thing can actually do stuff.
There's quite a lot of pro-AI content and opinion floating around right now that amounts to: "Yeah, this thing is dangerous, threatens climate goals, is accelerating an education crisis, only exists because of mass theft, creates a massive category of new problems we don't yet know how to solve...but I'm getting value out of it". And the thing that keeps coming to mind is "Don't Look Up".
...is just coded "white rich guy" speak to say "I won't suffer the consequences, so I might as well not bother"
shameful
The fact that you're so sheltered as a western white rich programmer that you can AFFORD to be misinformed about all the way AI is bad for the rest of us (spoiler: you won't have to suffer it because you're sheltered)
That's exactly the issue: rich white guy pushing for "more, more, more" of their toys, without looking to the bad consequences of them because "oh no, I'm not interested and I'm not an expert"
Then you might as well keep taking the airplane every weekend, etc, and just "not get interested" or "not get enough data" about why it's bad...
I am looking forward to my future career as a gardener (although with a smidge of sadness) when AI has sucked out all creativity, ingeniuity and enjoyment from my field of work.
Thoughtful post.
this won't be true until the financials are squared up. Right now there is a lot of funky math propping up these companies. If the bet doesn't pay off with real productivity gains the whole AI industry will disappear the way the first generation of AI assistants did ten years ago.
Remember Echo?
That said, I don’t think it quite echoes Echo (that pun might be intentional) or maybe it does, financially speaking.
For me, this wave feels different: deeper integration, broader scale, and real-time utility. I agree, if the returns don’t hold up, a major correction isn’t off the table.
It reminds me a bit of the smartphone era, rapid adoption, strong dependence. I guess, the difference is, that phones had clear monetization paths. With AI, we’re still betting.
The bigger question for me is, if this all collapses, what happens to the workflows and investments already built around it?
In addition there's an element of personality: One person might think it's more meaningful to compose a party invitation to friends personally, another asks ChatGPT to write it to make sure they don't miss anything.
As someone who is on the skeptic side, I of course don't appreciate the "holding it wrong" discourse, which I think is rude and dismissive of people and their right to choose how they do things. But at the end of the day it's just a counterweight to the corresponding discourse on the skeptic side which is essentially "gross, you used AI". What's worse though is that institutions are forcing us to use this type of tool without consent, either through mandates or through integrations without opt-outs. To me it doesn't meet the bar for a revolutionary technology if it has to be propagated forcibly or with social scolding.
This has been my experience as well, especially since the only space I have to work with agents are on C++ projects where they flat-out spiral into an increasingly dire loop of creating memory leaks, identifying the memory leaks they created, and then creating more memory leaks to fix them.
There are probably some fields or languages where these agents are more effective-I've had more luck with small tasks in JS and Python for sure. But I've burned a full work week trying and falling to get Claude to add some missing destructors to a working but leaky C++ project.
At one point I let it run fully unattended on the repo in a VM for four hours with the goal of adding a destructor to a class. It added nearly 2k broken LOC that couldn't compile because its first fix added an undeclared destructor, and its response to that failure was to do the same thing to _every class in the project_. Every time saying "I now know what the problem is" as it then created a new problem.
If LLMs could just lack confidence in themselves for even a moment and state that they don't know what the problem is but it's willing to throw spaghetti at the wall to find out, I could respect that more than it marching through with absolute confidence that its garbage code did anything but break the project.
It's not like these are esoteric languages, so I'm not sure why they (in my experience) have such a big problem with them.
My current guess is that C and C++ aren't so mono domain. The languages its good at seems to be languages used predominantly in the web world, and most of the open code out there is for web.
I'm curious if anyone has tried to use it for C# in Unity or a bespoke game in Rust and how well it does there.
The part of working with it that I can't get past is its full-throated confidence in its actions. It'll break down, step by step, why its actions would solve the problem, then attempts to build it and can't recognize how or why its changes made the problem worse.
There's a domain that I do find some application for it in, which is "configure ffmpeg". That and tasks within game engines that are similar in nature, where the problem to be solved involves API boilerplate, do benefit.
What also works, to some degree, is to ask it to generate code that follows a certain routine spec, like a state machine. Again, very boilerplate-driven stuff, and I expect an 80% solution and am pleasantly surprised if it gets everything.
And it is very willing to let me "define a DSL" on the fly and then act as a stochastic compiler, generating output patterns that are largely in expectation with the input. That is something I know I could explore further.
And I'm OK with this degree of utility. I think I can refine my use of it, and the tools themselves can also be refined. But there's a ceiling on it as a directly impactful thing, where the full story will play out over time as a communication tech, not a "generate code" tech, with the potential to better address Conway's law development issues.
So for those users, you might still be finding LLMs lacking. It's also understandable that you might get more errors when generating C++ than say C#, since it's harder to use.
It reminds me that I recently changed companies in part because at my previous job, I didn't have as many opportunities to work with LLMs (doing the things they are good at) as my current one.
Not a comment on your primary points, but most LLMs out there also suck at enterprise Java projects (Spring boot, etc.) unless it's the most mundane things such as writing simple controllers.
That's amazing, given C++ provides default destructors. They don't "miss" unless you or the AI omitted them deliberately. So I'll guess you meant custom destructors that clean up manual resource management.
Which likely means you ran into the classic failure mode of AI, you underspecified constraints. It's C++. Humans or AI, as soon as there's more than one, be prepared to discuss ownership semantics in absolutely nauseating detail.
You don't fix broken ownership semantics by telling a junior "try to get it to compile" and walking away for a couple of hours, either.
And LLMs are interns, juniors at best. That means for a solo C++ project, yeah, you're probably better off not using them. The core issue being that they're incapable of actually learning, not the lack of skills per se - if you're willing to spec the hell out of everything every single time, they still do OK. It's soul-deadening, but it works.
> If LLMs could just lack confidence in themselves for even a moment and state that they don't know what the problem is but it's willing to throw spaghetti at the wall to find out, I could respect that more than it marching through with absolute confidence that its garbage code did anything but break the project.
As I said, juniors. This is a pretty common failure mode for fresh grads, too.
Except, they make bizarre mistakes that human juniors would not, and which are hard for even experienced developers to spot, because while it becomes possible with experience to preempt the kinds of mistakes that stem from the flawed or incomplete mental models of human juniors, these models do not themselves have a model of the computation underlying the code they produce—which in my experience makes the whole process of working with them endlessly frustrating.
The commit history[1] looks like a totally normal commit history on page 1, but I clicked farther down and found commits like
> Ask Claude to fix bug with token exchange callback when reusing refresh token.
> As explained in the readme, when we rotate the refresh token, we still allow the previous refresh token to be used again, in case the new token is lost due to an error. However, in the previous commit, we hadn't properly updated the wrapped key to handle this case.
> Claude also found that the behavior when `grantProps` was returend without `tokenProps` was confusing -- you'd expect the access token to have the new props, not the old. I agreed and had Claude update it.
> Claude Code transcript: https://claude-workerd-transcript.pages.dev/oauth-provider-t...
It seems no different than "run prettier on the code" or "re-run code gen". I don't think it's fair to pick on this specific repo if your objection is just "I don't like reviewing code written by a tool that can't improve, regardless of what the commits look like". I'd call that "misanthropomorphizing".
1. https://github.com/cloudflare/workers-oauth-provider/commits...
- Plan my garden
- Identify plant issues
- Help with planting tips
- General research/better google
- Systems Engineering
- etc
Maybe the code generation isnt great but it is really good in a lot of areas.
The other question is, would you have fared any worse without gen ai? Personally I find old school sources more comprehensive. I once picked up a book for free at my garden center, it contained all information about tomatoes including color photos of most every possible deficiency, pest, and disease. If something goes wrong with my tomatoes in a 30 second trip to my bookshelf I have access to high quality information. Meanwhile, what is the genAI trained on? Is my book in the training set? Maybe it is. But maybe its also polluted with the blogspam slop of the last 15 years on the internet, and that isn't easy to parse out without having a book like mine on hand, or better yet the horticultural training the authors behind that book I have on hand have had. Trusting my book is therefore trusting the highest quality output from the corpus of all humanity.
And it is like that with most other things. For most things you might encounter in life, there is a high quality source of information or book for low cost or free available. Why reach into the dark and pull out words from the sack when you can get the real deal so easily?
A recent example I had was fixing a toilet that broke. I don't know the name of the components within the tank, but I can understand what they do. I took a picture of the broken component and asked what it was - chatGPT was able to tell me, I was able to get a replacement, and that was that.
I have to remember I'm relatively spoiled in the sense that I know how to get that information out of Google. But for a less technically inclined person, I don't know how they'd even to do this themselves. Best I can think of is take a picture an ask a plumber(if you know one) or someone at a hardware store(if they themselves are even knowledgeable about this).
https://kagi.com/search?q=toilet+tank+diagram&r=us&sh=wnbKX0...
2. People mostly hate new things (human nature)
3. its fun watching nerds complain about AI after they spent the past 50 years automating other peoples careers.. turnabout is fair play? All the nerdsniping around GenAI and tech taking away from programming is funny.
4. Remember its a tool, not a replacement (yet), get used to it.. we're not supposed to be luddites. use it and optimize - its our job to convince managers and executives that we are masters over the tool and not vice versa
I'm certainly not going to tell anyone that they're wrong if they try AI and don't like it! But this guy... did not try it? He looked at a commit log, tried to imagine what my experience was like, and then decided he didn't like that? And then he wrote about it?
Folks, it's really not that hard to actually try it. There is no learning curve. You just run the terminal app in your repo and you ask it to do things. Please, I beg you, before you go write walls of text about how much you hate the thing, actually try it, so that you actually have some idea what you're talking about.
Six months ago, I myself imagined that I would hate AI-assisted coding! Then I tried it. I found out a lot of things that surprised me, and it turns out I don't hate it as much as I thought.
[0] https://github.com/cloudflare/workers-oauth-provider/commits... (link to oldest commits so you can browse in order; newer commits are not as interesting)
Edit to add: Full disclosure: I've been a fan of his writing and work for a long time anyway, and am a supporter on Patreon.
(Though I would say, before I really tried it, I had believed that AI does a lot more "plagiarism" -- lifting of entire passages or snippets. Actually using it has made clearer to me that it really is more like someone who has learned by reading, and then applies those learnings, at least most of the time.)
I just don't like it when people argue that it doesn't work or doesn't do what they want without having actually tried it... There's too much echo chamber of people who don't want AI to work, and so convince each other that it doesn't, and repeat each other's arguments, all without ever taking a step outside their comfort zones. (Again... this was me, 6 months ago.)
I was deliberately vague about my direct experiences because I didn't want anyone to do… well, basically this exact reply, "but you didn't try my preferred XYZ workflow, if you did, you'd like it".
What I saw reflected in your repo history was the same unpleasantness that I'd experienced previously, scaled up into a production workflow to be even more unpleasant than I would have predicted. I'd assumed that the "agentic" stuff I keep hearing about would have reduced this sort of "no you screwed up" back-and-forth. Made particularly jarring was that it was from someone for whom I have a lot of respect (I was a BIG fan of Sandstorm, and really appreciated the design aesthetic of Cap'n Proto, although I've never used it).
As a brutally ironic coda about the capacity of these tools for automated self-delusion at scale, I believed the line "Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.", and in the post, I accepted the premise that it worked. You're not a novice here, you're on a short list of folks with world-class appsec chops that I would select for a dream team in that area. And yet, as others pointed out to me post-publication, CVE-2025-4143 and CVE-2025-4144 call into question the efficacy of "thorough review" as a mechanism to spot the sort of common errors likely to be generated by this sort of workflow, that 0xabad1dea called out 4 years ago now: https://gist.github.com/0xabad1dea/be18e11beb2e12433d93475d7...
Having hand-crafted a few embarrassing CVEs myself with no help from an LLM, I want to be sure to contextualize the degree to which this is a "gotcha" that proves anything. The main thrust of the post is that it is grindingly tedious to conclusively prove anything at all in this field right now. And even experts make dumb mistakes, this is why the CVE system exists. But it does very little to disprove my general model of the likely consequences of scaled-up LLM use for coding, either.
The CVE is indeed embarrassing, particularly because the specific bug was on my list of things to check for... and somehow I didn't. I don't know what happened. And now it's undermining the whole story. Sigh.
Again, it's tough to talk about this while constantly emphasizing that the CVE at best a tiny little data point, not anywhere close to a confirmation bullseye, but my model of this process would account for it. And the way it accounts for it is in what I guess I need to coin a term for, "vigilance decay". Sort of like alert fatigue, except there are no alerts, or hedonic adaptation, for when you're not actually happy. You need to keep doing the same kinds of checks, over and over, at the same level of intensity forever to use one of these tools, and humans are super bad at that; so, at some point in your list, you developed the learned behavior "hey, this thing is actually getting most of this stuff right, I am going to be a little less careful". Resisting this is nigh impossible. The reason it's less of a problem with human code review is that as the human seems to be getting better at not making the mistakes you've spotted before, they actually are getting better at not making those mistakes, so your relaxed vigilance is warranted.
Well this is the issue. The author assumed that these tools require no skill to use properly.
Even though I've written a bunch of code, I don't really enjoy writing code. I enjoy building systems and solving problems. If I never have to write another line of code to do that, I couldn't be happier.
Other people have invested a lot of their time and their self image in becoming a good "computer programmer", and they are going to have a really hard time interacting with AI programmers or really any other kind of automation that removes the part of the job that they actually enjoy doing.
Really, it's not much difference between that and musicians that got mad at djs for just playing other people's music, who then got mad at djs that use ableton live instead of learning to beat match. Do you think the _process_ of making music is the important part, or the final sounds that are produced?
Just like DJ's and collage artists can be inventive and express ideas using work created by other people, people who use AI to code can still express creative ideas through how they use it and combine it and manipulate it.
Every single day friends of mine ( many who aren’t programmers just wanna be CEOs of the next trend) tell me about these “project” of theirs. At the beginning it amazed them. They got a GUI up displaying some numbers up in Python with just a few queries! So cool! To them this probably looks like black magic. As the project grows to anything beyond the trivial, they cannot push it further along. They can’t look at the code themselves to fix the minor issue any human would immediately see because they have no clue/haven’t read the bulk of generated code (often trash quality riddled with vulnerabilities)… They often resort to trying a million different tools and platforms, spinning in circles, praying another model can fix their abomination… You get the idea. Sorry for poor grammar typing on small phone on bus.
This has me questioning the level of complexity most people on HN are working at day to day. For so many to be amazed at the ability to spontaneously generate CRUD apps that already exist in a thousand different iterations I’m guessing it’s pretty low.
I know IRL none of the former but am myself/know plenty of engineers who are the latter. It very well could be I'm the one with the biased sample but my experience is that I only hear about vibe coding from "influencers" and articles/comments decrying it.
I'm at a YC company. We have a dev team < 15 engineers. We all expressed concerns about going full vibe coding, so the technical co-founder hired a bunch of vibe coders and has them on a private "SEAL team" building vibe coded monstrosities without any regard to safety, integrating with our main product, hosting consistency, or anything else.
I don't know if this is anyone else's experience. But I'm WAY more concerned about vibe coding than others seem to be just due to my proximity to the nonsense.
Can you share more about what the workflow of this new team looks like? I consider myself a strong proponent and heavy user of AI tools for development but it's still hard for me to fathom how true "vibe coding" at this scale is possible.
The PM just decided they didn't want to queue up tickets or designs anymore, so they're on this vibe coding team. We aren't even in interviews with devs going onto that team, though you can hardly call them devs (Maybe that's elitist of me. but that's how I'm feeling.)
And sorry this is turning into a therapy session, but all of us devs actually like AI tooling. We're getting decent to good value out of claude code and the like. We've implemented a fairly neat little MCP for our CS team to use as a walled-garden admin tool. And yet, since we criticize things, we're branded as "non-believers" and the implication is that we are impeding the company's growth.
Suffice it to say, all us career devs are already interviewing and on our way out here.
Yes, but most people here do CRUD websites, so they’re extatic.
Pharmaceuticals are regulated not just by testing the end product but by regulating the entire production process[1]. How this came to be has an interesting story but I believe that it's essentially the correct way to do things.
Saying that process is irrelevant, that only the end product matters is a very persuasive idea but the problem comes when it's time to validate the end product. Any test you can perform on a black box can only show the presence or absence of a specific problem, never the absence of problems.
“Program testing can be used to show the presence of bugs, but never to show their absence!” - Dijkstra
[1] If you're interested in how this regulatory structure has failed to regulate international drug manufacturers, check out "Bottle of Lies" by Katherine Eban.
It is both, for different reasons. Not caring for the process is the path of the mid-wit.
“All specialists are midwits” is obviously false.
Alternatively, even “most expert musicians are their own luthiers”… maybe more obviously false?
Overall, I think I'm arguing that you cannot become an expert without continually improving the process; and to do that you need to continually understand and study the process at progressively deeper levels. If a person cares nothing for the process, just the outcome - they therefore cannot achieve mastery (I'm like 85% convinced this is true).
If we assume a specialist is also an expert, then by definition specialists are not midwits - I agree.
An expert musician I would suspect should know how the different qualities of the instrument they play affect the outcome. Whether they can actually construct an instrument is a different question, but they ought to know the differences. Can you also be an expert musician and not know, nor care about the differences in instrument construction quality and material? Can you be an expert on a piano and not know what makes for a good piano?
I think the most exact analogy would be a musician who does not care about how an instrument is played - they don't study it, they don't care, just do something that makes for a nice sound, and that is all they desire. I don't think a person can achieve mastery in that manner, to achieve mastery - the 'how' does matter; the master should know everything about the different 'hows' and the trade-offs; and that is what makes them a master.
This is why a stack overflow copy paster and a gen ai cruch wielder are both the same kind of bad. Both don't understand the context of the code hence grasping for "please solve this for me someone/thing" tooling instead of the underlying documentation that explains every thing for someone with knowledge to apply. Both require the same care to either not hire or to disempower such that one stupid bug pushed on friday isn't going to bring down the stack on the weekend, that it can be reviewed by people with an understanding of what is happening.
> I have woefully little experience with these tools.
> I do not want to be using the cloud versions of these models with their potentially hideous energy demands; I’d like to use a local model. But there is obviously not a nicely composed way to use local models like this.
> The models and tools that people are raving about are the big, expensive, harmful ones. If I proved to myself yet again that a small model with bad tools was unpleasant to use, I wouldn’t really be addressing my opponents’ views.
Then without having any real practical experience with the cutting edge tooling they predict:
> As I have written about before, I believe the mania will end. There will then be a crash, and a “winter”. But, as I may not have stressed sufficiently, this crash will be the biggest of its kind — so big, that it is arguably not of a kind at all. The level of investment in these technologies is bananas and the possibility that the investors will recoup their investment seems close to zero.
I think a more accurate take is this will be like self driving, huge investments, many more losers thank winners, and it will take longer than all the boosters think. But in the end we did get actual self driving cars, but this time it's with LLMs it is something that anyone can use by clicking a link vs. waiting for lots of cars to be built and deployed.
That's one of the things we sometimes focus on.
> But in the end we did get actual self driving cars
Kind of. Barely. You still can't buy a car without a steering wheel. If they were self-driving, that would be a waste of space and resources.
> Energy Usage
This has always seemed like a fake problem to me, and mostly just a way to rally environmentalists around protesting genai (and bitcoin and…). The costs (time & capital) associated with nuclear power are mostly regulatory; in the past decade China has routinely built numerous modern reactors while the west has been actively scaling nuclear power down. I have yet to see a good case for not simply scaling up power generation to meet demand. This is assuming that AI power demands remain constant and even require gigawatt-scale power sources to train & operate. For all we know, one paper from one researcher anywhere in the world could show us that current model architectures are orders of magnitude inefficient and make current-standard frontier model training doable in a basement.
> The Educational Impact
A few years ago I came across some cool computer vision videos on youtube and became inspired to try reading the papers and understanding more. The papers were surprisingly short (only a few dozen pages usually), but contained jargon and special symbols that I found impossible to google. I eventually read more papers and even some books, and eventually discovered that if I had simply read e.g. Hartley & Zisserman’s Multiple View Geometry (commonly just referenced as “MVG”, another mystery I had to figure out) I would have had the hidden knowledge necessary to understand much of the math. If I had simply read earlier papers, I’d be familiar with the fact that much of the content is just slight additions on previous works rather than fully greenfield ideas. I’m not rich enough or successful enough to access the formal education that would have probably provided me the resources to learn this. Youtube videos from practitioners in the field felt more like insecure nerds gatekeeping their field with obtuse lessons. It took me about a year to produce anything useful without just using opencv. When I tried asking the same questions I had as a beginner, chatgpt was able to answer everything I was confused on, answer my dumb followup questions, and even produce demo code so I could explore the concepts. That experience has made it an irreplaceable educational tool for me. Im sure more people will cheat, but college is already an overvalued competence signal anyways.
> The Invasion of Privacy
Temporary problem, someone will produce a usable hardware equivalent that makes it similarly easy to use genai w/o the privacy implications. This is a segment begging for someone to provide a solution, I think most people with the resources currently to solve it just want to be the AI gatekeeer and thus the service model for this tech.
> The Stealing
We won’t find common ground, I dont believe in ownership of information and think copyright is largely immoral.
> The Fatigue
This section seems to largely revolve around no clear obvious tooling route to simply using the tech (complete empathy there, Im currently building my own tooling around it), along with not having a clear mental model of how these models work. I had the same anxiety when chatgpt first blew my mind and just decided to learn how they work. They’re dirt fucking simple, shitty non-learned computer vision systems are orders of magnitude more complex. You seem like a cool person, hmu my profile name @ gmail if you want to talk about it! I like 3b1b’s videos if you want a shorter version, karpathy is overhyped imo but also cool if you want to build a toy model line by line,
There is no such thing as ethical AI, because "voluntary" usually means voluntary without the participants really understanding what they are making, which is just another tool in the arms race of increasingly sophisticated AI models – which will largely be needed just to "one up" the other guy.
"Ethical" AI is like forced pit-fighting where we ask if we can find willing volunteers to fight for the death for a chance for their freedom. It's sickening.
Anti-inductive processes are those, like e.g., fraud, which change because you have measured them. Once a certain sort of fraud is made illegal and difficult, most move on to a different kind.
AI models are at-base memorisers, with a little structured generalisation around the region of memory. They are not structure learners who happen to memorize. This makes creating a theory of an AI model kinda impossible because the "representations" they learn are so-called "entangled", another way of saying: garbage. The AI model's representation of language does not match language structure, rather each representation is a mixture of what we can evidently see as several competing structural features. But the model has no access to this structure, because its literally not a property of the data but of the data generating process.
Now this seems like "a good theory" of AI in the monkish sense, but a problem arises: for each fragile boundary that benchmarking and testing shows, the AI companies collect data on these failures and retrain the model. This is, in a sense, anti-inductive fraud: companies seek to find out how they are found out, and extend the model to cover those regions.
But models never actually gain the structural capabilities their behaviour is taken to imply. So, of course, this would drive any theorist insane. By the time you've figured out gpt3.5's fakery, all the openai chatbot data theyve collected has made gpt4 -- and all paper has been covered over all the edges. Until your fingers are cut next time on the now dark fragile boundary of openai's deception.
I can give very good reasons, independent of any particular AI model why: the manifold hypothesis is wrong, AI models are memorizers + structure, why they dont find latent structure but combine statistical heurstics, why their representations are not "entangled" such that they can be "disentangled" but necessarily model-unknowable blends of expert-known scientific joints -- and so on.
But none of this knowledge can become useful in the face of a billion-dollar training company competing against me, anti-inductively, to apply any of this to any given model.
Perhaps then, I suppose, we can look at the behaviour these systems induce in their users and its downstream effects. This is what OP here does, throws hands up at a theory and says only: however this is working, it cannot be good.
This is, of course, what we do with fraud, cults, and many other systems of deception. We say: we're not going to argue with the conspiracy theoriest, they are specialists at coming up with ever more elbobate adaptions to the self-deception. Instead, we observe by proxy, that they behave pathological;y -- that they are broken in otehr ways.
I think that's a fair approach, and at least one that allows the author to proceed without a theory.
(Sorry for the low-effort response. But it's kinda true)