AI is a floor raiser, not a ceiling raiser

112 jjfoooo4 70 7/31/2025, 5:01:37 PM elroy.bot ↗

Comments (70)

stillpointlab · 48m ago

This mirrors insights from Andrew Ng's recent AI startup talk [1].

I recall he mentions in this video that the new advice they are giving to founders is to throw away prototypes when they pivot instead of building onto a core foundation. This is because of the effects described in the article.

He also gives some provisional numbers (see the section "Rapid Prototyping and Engineering" and slides ~10:30) where he suggests prototype development sees a 10x boost compared to a 30-50% improvement for existing production codebases.

This feels vaguely analogous to the switch from "pets" to "livestock" when the industry switched from VMs to containers. Except, the new view is that your codebase is more like livestock and less like a pet. If true (and no doubt this will be a contentious topic to programmers who are excellent "pet" owners) then there may be some advantage in this new coding agent world to getting in on the ground floor and adopting practices that make LLMs productive.

1. https://www.youtube.com/watch?v=RNJCfif1dPY

falcor84 · 33m ago

Great point, but just mentioning (nitpicking?) that I never heard about machines/containers referred to as "livestock", but rather in my milieu it's always "pets" vs "cattle". I now wonder if it's a geographical thing.

bayindirh · 22m ago

Yeah, the CERN talk* [0] coined the term Pets vs. Cattle analogy, and it was way before VMs were cheap on bare metal. I think the word just evolved as the idea got rooted in the community.

We use the same analogy for the last 20 years or so. Provisioning 150 cattle servers take 15 minutes or so, and we can provision a pet in a couple of hours, at most.

[0]: https://www.engineyard.com/blog/pets-vs-cattle/

*: Engine Yard post notes that Microsoft's Bill Baker used the term earlier, though CERN's date (2012) checks out with our effort timeline and how we got started.

HPsquared · 30m ago

Boxen? (Oxen)

bayindirh · 21m ago

AFAIK, Boxen is a permutation of Boxes, not Oxen.

mananaysiempre · 11m ago

There seems to be a pattern of humorous plurals in English where by analogy with ox ~ oxen you get -x ~ -xen: boxen, Unixen, VAXen.

Before you call this pattern silly, consider that the fairly normal plural “Unices” is by analogy with Latin plurals in -x = -c|s ~ -c|ēs, where I’ve expanded -x into -cs to make it clear that the Latin singular comprises a noun stem ending in -c- and a (nominative) singular ending -s, which is otherwise completely nonexistent in English. (This is extra funny for Unix < Unics < Multics.)

bayindirh · 9m ago

Yeah. After reading your comment, I thought "maybe the Xen hypervisor is named because of this phenomena". "xen" just means "many" in that context.

Also, probably because of approaching graybeard territory, Thinking about boxen of VAXen running UNIXen makes me feel warm and fuzzy. :D

lubujackson · 6m ago

Oo, the "pets vs. livestock" analogy really works better than the "craftsmen vs. slop-slinger" arguments.

Because using an LLM doesn't mean you devalue well-crafted or understandable results. But it does indicate a significant shift in how you view the code itself. It is more about the emotional attachment to code vs. code as a means to an end.

verelo · 3m ago

Oh man i love this take. It's how I've been selling what I do when I speak with a specific segment of my audience: "My goal isn't to make the best realtors better, it's to make the worst realtors acceptable".

And my client is often the brokerage, they just want their agents to produce commissions so they make a cut. They know their top producers probably wont get much from what I offer, but we all see that their worst performers could easily double their business.

gruez · 21m ago

The blog post has a bunch of charts, which gives it a veneer of objectivity and rigor, but in reality it's just all vibes and conjecture. Meanwhile recent empirical studies actually point in the opposite direction, showing that AI use increases inequality, not decrease it.

https://www.economist.com/content-assets/images/20250215_FNC...

https://www.economist.com/finance-and-economics/2025/02/13/h...

Calavar · 18m ago

The graphic has four studies that show increased inequality and six that show reduced inequality.

gruez · 6m ago

Read my comment again. keyword here is "recent". The second link also expands on why it's relevant. It's best to read the whole article, but here's a paragraph that captures the argument:

>The shift in recent economic research supports his observation. Although early studies suggested that lower performers could benefit simply by copying AI outputs, newer studies look at more complex tasks, such as scientific research, running a business and investing money. In these contexts, high performers benefit far more than their lower-performing peers. In some cases, less productive workers see no improvement, or even lose ground.

bgwalter · 16m ago

Thanks for the links. That should be obvious to anyone who believes that $70 billion datacenters (Meta) are needed and the investment will be amortized by subscriptions (in the case of Meta also by enhanced user surveillance).

The means of production are in a small oligopoly, the rest will be redundant or exploitable sharecroppers.

(All this under the assumption that "AI" works, which its proponents affirm in public at least.)

LeftHandPath · 54m ago

There are some things that you still can't do with LLMs. For example, if you tried to learn chess by having the LLM play against you, you'd quickly find that it isn't able to track a series of moves for very long (usually 5-10 turns; the longest I've seen it last was 18) before it starts making illegal choices. It also generally accepts invalid moves from your side, so you'll never be corrected if you're wrong about how to use a certain piece.

Because it can't actually model these complex problems, it really requires awareness from the user regarding what questions should and shouldn't be asked. An LLM can probably tell you how a knight moves, or how to respond to the London System. It probably can't play a full game of chess with you, and will virtually never be able to advise you on the best move given the state of the board. It probably can give you information about big companies that are well-covered in its training data. It probably can't give you good information about most sub-$1b public companies. But, if you ask, it will give a confident answer.

They're a minefield for most people and use cases, because people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice. It's like walking on a glacier and hoping your next step doesn't plunge through the snow and into a deep, hidden crevasse.

smiley1437 · 40m ago

> people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice.

I have friends who are highly educated professionals (PhDs, MDs) who just assume that AI\LLMs make no mistakes.

They were shocked that it's possible for hallucinations to occur. I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?

viccis · 12m ago

> I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?

I think it's just that LLMs are modeling generative probability distributions of sequences of tokens so well that what they actually are nearly infallible at is producing convincing results. Often times the correct result is the most convincing, but other times what seems most convincing to an LLM just happens to also be most convincing to a human regardless of correctness.

throwawayoldie · 38s ago

https://en.wikipedia.org/wiki/ELIZA_effect

> In computer science, the ELIZA effect is a tendency to project human traits — such as experience, semantic comprehension or empathy — onto rudimentary computer programs having a textual interface. ELIZA was a symbolic AI chatbot developed in 1966 by Joseph Weizenbaum and imitating a psychotherapist. Many early users were convinced of ELIZA's intelligence and understanding, despite its basic text-processing approach and the explanations of its limitations.

bayindirh · 18m ago

Computers are always touted as deterministic machines. You can't argue with a compiler, or Excel's formula editor.

AI, in all its glory, is seen as an extension of that. A deterministic thing which is meticulously crafted to provide an undisputed truth, and it can't make mistakes because computers are deterministic machines.

The idea of LLMs being networks with weights plus some randomness is both a vague and too complicated abstraction for most people. Also, companies tend to say this part very quietly, so when people read the fine print, they get shocked.

throwawayoldie · 8m ago

My experience, speaking over a scale of decades, is that most people, even very smart and well-educated ones, don't know a damn thing about how computers work and aren't interested in learning. What we're seeing now is just one unfortunate consequence of that.

(To be fair, in many cases, I'm not terribly interested in learning the details of their field.)

jasonjayr · 11m ago

I worry that the way the models "Speak" to users, will cause users to drop their 'filters' about what to trust and not trust.

We are barely talking modern media literacy, and now we have machines that talk like 'trusted' face to face humans, and can be "tuned" to suggest specific products or use any specific tone the owner/operator of the system wants.

rplnt · 17m ago

Have they never used it? Majority of the responses that I can verify are wrong. Sometimes outright nonse, sometimes believable. Be it general knowledge or something where deeper expertise is required.

physicsguy · 25m ago

It's super obvious even if you try and use something like agent mode for coding, it starts off well but drifts off more and more. I've even had it try and do totally irrelevant things like indent some code using various Claude models.

falcor84 · 36m ago

I agree with most of TFA but not this:

> This means cheaters will plateau at whatever level the AI can provide

From my experience, the skill of using AI effectively is of treating the AI with a "growth mindset" rather than a "fixed" one. What I do is that I roleplay as the AI's manager, giving it a task, and as long as I know enough to tell whether its output is "good enough", I can lend it some of my metagcognition via prompting to get it to continue working through obstacles until I'm happy with the result.

There are diminishing returns of course, but I found that I can get significantly better quality output than what it gave me initially without having to learn the "how" of the skill myself (i.e. I'm still "cheating"), and only focusing my learning on the boundary of what is hard about the task. By doing this, I feel that over time I become a better manager in that domain, without having to spend the amount of effort to become a practitioner myself.

amelius · 1h ago

AI is an interpolator, not an extrapolator.

canadaduane · 21m ago

Very concise, thank you for sharing this insight.

throe23486 · 1h ago

I read this as interloper. What's an extraloper?

shagie · 15m ago

An interloper being someone who intrudes or meddles in a situation (inter "between or amid) + loper "to leap or run" - https://en.wiktionary.org/wiki/loper ), an extraloper would be someone who dances or leaps around the outside of a subject or meeting with similar annoyances.

exasperaited · 22m ago

Opposite of "inter-" is "intra-".

Intraloper, weirdly enough, is a word in use.

andrenotgiant · 1h ago

This tracks for other areas of AI I am more familiar with.

Below average people can use AI to get average results.

pcrh · 27m ago

This is in line with another quip about AI: You need to know more than the LLM in order to gain any benefit from it.

itsoktocry · 1h ago

That explains why people here are against it, because everyone is above average I guess.

falcor84 · 32m ago

I'm not against it. I wonder where in the distribution it puts me.

manmal · 1h ago

Since agents are good only at greenfield projects, the logical conclusion is that existing codebases have to be prepared such that new features are (opinionated) greenfield projects - let all the wiring dangle out of the wall so the intern just has to plug in the appliance. All the rest has to be done by humans, or the intern will rip open the wall to hang a picture.

PaulHoule · 59m ago

Hogwash. If you can't figure out how to do something with project Y from npm try checking it out from Github with WebStorm and asking Junie how to do it -- often you get a good answer right away. If not you can ask questions that can help you understand the code base. Don't understand some data structure which is a maze of Map<String, Objects>(s) it will scan how it is used and give you draft documentation.

Sure you can't point it to a Jira ticket and get a PR but you certainly can use it as a pair programmer. I wouldn't say it is much faster than working alone but I end up writing more tests and arguing with it over error handling means I do a better job in the end.

falcor84 · 26m ago

> Sure you can't point it to a Jira ticket and get a PR

You absolutely can. This is exactly what SWE-Bench[0] measures, and I've been amazed at how quickly AIs have been climbing those ladders. I personally have been using Warp [1] a lot recently and in quite a lot of low-medium difficulty cases it can one-shot a decent PR. For most of my work I still find that I need to pair with it to get sufficiently good results (and that's why I still prefer it to something cloud-based like Codex [2], but otherwise it's quite good too), and I expect the situation to flip over the coming couple of years.

[0] https://www.swebench.com/

[1] https://www.warp.dev/

[2] https://openai.com/index/introducing-codex/

esafak · 16m ago

How does Warp compare to others you have tried?

falcor84 · 1m ago

I've not used it for long enough yet for this to be a strong opinion, but so far I'd say that it is indeed a bit better than Claude Code, as per the results on Terminal Bench[0]. And on a side note, I quite like the fact that I can type shell commands and chat commands interchangeably into the same input and it just knows whether to run it or respond to it (accidentally forgetting the leading exclamation mark has been a recurring mistake for me in Claude Code).

[0] https://www.tbench.ai/

spion · 30m ago

I think agents have a curve where they're kinda bad at bootstrapping a project, very good if used in a small-to-medium-sized existing project and then it goes downhill from there as size increases, slowly.

Something about a brand-new project often makes LLMs drop to "example grade" code, the kind you'd never put in production. (An example: claude implemented per-task file logging in my prototype project by pushing to an array of log lines, serializing the entire thing to JSON and rewriting the entire file, for every logged event)

yoz-y · 58m ago

They’re not. They’re good at many things and bad at many things. The more I use them the more I’m confused about which is which.

bfigares · 1h ago

AI raises everything - the ceiling is just being more productive. Productivity comes from adequacy and potency of tools. We got a hell of a strong tool in our hands, therefore, the more adequate the usage, the higher the leverage.

infecto · 57m ago

Surprised to see this downvoted. It feels true to me. Sure there are definitely novel areas where folks might not benefit but I can see a future where this tool becomes helpful for the vast majority of roles.

fellowniusmonk · 32m ago

The greatest use of LLMs is the ability to get accurate answers to queries in a normalized format without having to wade through UI distraction like ads and social media.

It's the opposite of finding an answer on reddit, insta, tvtropes.

I can't wait for the first distraction free OS that is a thinking and imagination helper and not a consumption device where I have to block urls on my router so my kids don't get sucked into a skinners box.

I love being able to get answers from documentation and work questions without having to wade through some arbitrary UI bs a designer has implemented in adhoc fashion.

leptons · 23m ago

I don't find the "AI" answers all that accurate, and in some cases they are bordering on a liability even if way down below all the "AI" slop it says "AI responses may include mistakes".

>It's the opposite of finding an answer on reddit, insta, tvtropes.

Yeah it really is because I can tell when someone doesn't know the topic well on reddit, or other forums, but usually someone does and the answer is there. Unfortunately the "AI" was trained on all of this, and the "AI" is just as likely to spit out the wrong answer as the correct one. That is not an improvement on anything.

> wade through UI distraction like ads and social media

Oh, so you think "AI" is going to be free and clear forever? Enjoy it while it lasts, because these "AI" companies are in way over their heads, they are bleeding money like their aorta is a fire hose, and there will be plenty of ads and social whatever coming to brighten your day soon enough. The free ride won't go on forever - think of it as a "loss leader" to get you hooked.

margalabargala · 5m ago

I agree with the whole first half, but I disagree that LLM usage is doomed to ad-filled shittyness. AI companies may be hemmoraging money, but that's because their product costs so much to run; it's not like they don't have revenue. The thing that will bring profitability isn't ads, it will be innovations that let current-gen-quality LLMs run at a fraction of the electricity and power cost.

Will some LLMs have ads? Sure, especially at a free tier. But I bet the option to pay $20/month for ad-free LLM usage will always be there.

TimPC · 1h ago

People should be worried because right now AI is on an exponential growth trajectory and no-one knows when it will level off into an s-curve. AI is starting to get close to good enough. If it becomes twice as good in seven months then what?

roadside_picnic · 9m ago

People don't consider that there are real physical/thermodynamic constraints on intelligence. It's easy to imagine some skynet scenario, but all evidence suggests that it takes significant increases in energy consumption to increase intelligence.

Even in nature this is clear. Humans are a great example: cooked food predates homo sapiens and it is largely considered to be a pre-requisite for having human level intelligence because of the enormous energy demands of our brains. And nature has given us wildly more efficient brains in almost every possible way. The human brain runs on about 20 watts of power, my RTX uses 450 watts at full capacity.

The idea of "runaway" super intelligence has baked in some very extreme assumptions about the nature of thermodynamics and intelligence, that are largely just hand waved away.

On top of that, AI hasn't changed in a notable way for me personally in a year. The difference between 2022 and 2023 was wild, between 2023 and 2024 changed some of my workflows, 2024 to today largely is just more options around which tooling I used and how these tools can be combined, but nothing really at a fundamental level feels improved for me.

mattnewport · 57m ago

What's the basis for your claim that it is on an exponential growth trajectory? That's not the way it feels to me as a fairly heavy user, it feels more like an asymptotic approach to expert human level performance where each new model gets a bit closer but is not yet reaching it, at least in areas where I am expert enough to judge. Improvements since the original ChatGPT don't feel exponential to me.

LeftHandPath · 35m ago

I was worried about that a couple of years ago, when there was a lot of hope that deeper reasoning skills and hallucination avoidance would simply arrive as emergent properties of a large enough model.

More recently, it seems like that's not the case. Larger models sometimes even hallucinate more [0]. I think the entire sector is suffering from a Dunning Kruger effect -- making an LLM is difficult, and they managed to get something incredible working in a much shorter timeframe than anyone really expected back in the early 2010s. But that led to overconfidence and hype, and I think there will be a much longer tail in terms of future improvements than the industry would like to admit.

Even the more advanced reasoning models will struggle to play a valid game of chess, much less win one, despite having plenty of chess games in their training data [1]. I think that, combined with the trouble of hallucinations, hints at where the limitations of the technology really are.

Hopefully LLMs will scare society into planning how to handle mass automation of thinking and logic, before a more powerful technology that can really do it arrives.

[0]: https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-m...

[1]: https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13...

esafak · 11m ago

really? I find newer models hallucinate less, and I think they have room for improvement, with better training.

nwienert · 55m ago

Let's look:

GPT-1 June 2018

GPT-2 February 2019

GPT-3 November 2021

GPT-4 March 2023

Claude tells me this is the rough improvement of each:

GPT-1 to 2: 5-10x

GPT-2 to 3: 10-20x

GPT 3 to 4: 2-4x

Now it's been 2.5 years since 4.

Are you expecting 5 to be 2-4x better, or 10-20x better?

esafak · 9m ago

How are you measuring this improvement factor? We have numerous benchmarks for LLMs and they are all saturating. We are rapidly approaching AGI by that measure, and headed towards ASI. They still won't be "human" but they will be able to do everything humans can, and more.

bluefirebrand · 38m ago

AI is not a floor raiser

It is a false confidence generator

recroad · 1h ago

AI isn't a pit. AI is a ladder.

layer8 · 11m ago

A ladder that doesn't reach the ceiling and sometimes ends up in imaginary universes.

anthk · 35m ago

Yeah, like in Nethack, while being blind and stepping on a cockatrice.

bitwize · 1h ago

AI is a shovel capable of breaking through the bottom of the barrel.

erlend_sh · 1h ago

Only for the people already affluent enough to afford the ever-more expensive subscriptions. Those most in need of a floor-raising don’t have the disposable income to take a bet on AI.

pdntspa · 7m ago

It's very easy to sign up for an API account and pay per-call, or even nothing. Free offerings out there are great (Gemini, OpenRouter...) and a few are even suitable for agentic development.

intended · 1h ago

Either you are the item being sold or you are paying for the service.

Nothing is free, and I for one prefer a subscription model, if only as a change from the ad model.

I am sure we will see the worst of all worlds, but for now, for this moment in history, subscription is better than ads.

Let’s also never have ads in GenAi tools. The kind of invasive intent level influence these things can achieve, will make our current situation look like a paradise

billyp-rva · 1h ago

Mixing this with a metaphor from earlier: giving a child a credit card is also a floor raiser.

tayo42 · 53m ago

I was thinking about this sentiment on my long car drive today.

it feels like when you need to paint walls in your house. If you've never done it before you'll probably reach for tape to make sure you don't ruin the ceiling and floors. the tape is a tool for amateur wall painters to get decent results somewhat efficiently compared to if they didn't. If your an actual good wall painter, tape only slows you down. You'll go faster without the "help".

42lux · 1h ago

AI is chairs.

msgodel · 1h ago

You'll find many people lack the willpower and confidence to even get on the floor though. If it weren't for that they'd already know a programming language and be selling something.

lupire · 1h ago

OP doesn't understand that almost everything is neither at the floor or the ceiling.

manyaoman · 1h ago

AI is a wall raiser.

givemeethekeys · 1h ago

AI is a floor destroyer not a ceiling destroyer. Hang on for dear life!! :P