Andrej Karpathy: Software in the era of AI [video]

497 sandslash 158 6/19/2025, 12:33:21 AM youtube.com ↗

Comments (158)

mentalgear · 42m ago

Meanwhile, I asked this morning Claude 4 to write a simple EXIF normalizer. After two rounds of prompting it to double-check its code, I still had to point out that it makes no sense to load the entire image for re-orientating if the EXIF orientation is fine in the first place.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.

ramon156 · 2m ago

The real question is how long it'll take until they're not brittle

diggan · 9m ago

> Meanwhile

What do you mean "meanwhile", that's exactly (among other things) the kind of stuff he's talking about? The various frictions and how you need to approach it

coreyh14444 · 6m ago

https://theeducationist.info/everything-amazing-nobody-happy...

fergie · 1h ago

There were some cool ideas- I particularly liked "psychology of AI"

Overall though I really feel like he is selling the idea that we are going to have to pay large corporations to be able to write code. Which is... terrifying.

Also, as a lazy developer who is always trying to make AI do my job for me, it still kind of sucks, and its not clear that it will make my life easier any time soon.

teekert · 40m ago

He says that now we are in the mainframe phase. We will hit the personal computing phase hopefully soon. He says llama (and DeepSeek?) are like Linux in a way, OpenAI and Claude are like Windows and MacOS.

So, No, he’s actually saying it may be everywhere for cheap soon.

I find the talk to be refreshingly intellectually honest and unbiased. Like the opposite of a cringey LinkedIn post on AI.

guappa · 1h ago

I think it used to be like that before the GNU people made gcc, completely destroying the market of compilers.

> Also, as a lazy developer who is always trying to make AI do my job for me, it still kind of sucks, and its not clear that it will make my life easier any time soon.

Every time I have to write a simple self contained couple of functions I try… and it gets it completely wrong.

It's easier to just write it myself rather than to iterate 50 times and hope it will work, considering iterations are also very slow.

ykonstant · 11m ago

At least proprietary compilers were software you owned and could be airgapped from any network. You didn't create software by tediously negotiating with compilers running on remote machines controlled by a tech corp that can undercut you on whatever you are trying to build (but of course they will not, it says so in the Agreement, and other tales of the fantastic).

abdullin · 3h ago

Tight feedback loops are the key in working productively with software. I see that in codebases up to 700k lines of code (legacy 30yo 4GL ERP systems).

The best part is that AI-driven systems are fine with running even more tight loops than what a sane human would tolerate.

Eg. running full linting, testing and E2E/simulation suite after any minor change. Or generating 4 versions of PR for the same task so that the human could just pick the best one.

OvbiousError · 40m ago

I don't think the human is the problem here, but the time it takes to run the full testing suite.

diggan · 5m ago

It is kind of a human problem too, although that the full testing suite takes X hours to run is also not fun, but it makes the human problem larger.

Say you're Human A, working on a feature. Running the full testing suite takes 2 hours from start to finish. Every change you do to existing code needs to be confirmed to not break existing stuff with the full testing suite, so some changes it takes 2 hours before you have 100% understanding that it doesn't break other things. How quickly do you lose interest, and at what point do you give up to either improve the testing suite, or just skip that feature/implement it some other way?

Now say you're Robot A working on the same task. The robot doesn't care if each change takes 2 hours to appear on their screen, the context is exactly the same, and they're still "a helpful assistant" 48 hours later when they still try to get the feature put together without breaking anything.

If you're feeling brave, you start Robot B and C at the same time.

Byamarro · 23m ago

I work in web dev, so people sometimes hook code formatting as a git commit hook or sometimes even upon file save. The tests are problematic tho. If you work at huge project it's a no go idea at all. If you work at medium then the tests are long enough to block you, but short enough for you not to be able to focus on anything else in the meantime.

gchamonlive · 8h ago

I think it's interesting to juxtapose traditional coding, neural network weights and prompts because in many areas -- like the example of the self driving module having code being replaced by neural networks tuned to the target dataset representing the domain -- this will be quite useful.

However I think it's important to make it clear that given the hardware constraints of many environments the applicability of what's being called software 2.0 and 3.0 will be severely limited.

So instead of being replacements, these paradigms are more like extra tools in the tool belt. Code and prompts will live side by side, being used when convenient, but none a panacea.

karpathy · 7h ago

I kind of say it in words (agreeing with you) but I agree the versioning is a bit confusing analogy because it usually additionally implies some kind of improvement. When I’m just trying to distinguish them as very different software categories.

miki123211 · 6h ago

What do you think about structured outputs / JSON mode / constrained decoding / whatever you wish to call it?

To me, it's a criminally underused tool. While "raw" LLMs are cool, they're annoying to use as anything but chatbots, as their output is unpredictable and basically impossible to parse programmatically.

Structured outputs solve that problem neatly. In a way, they're "neural networks without the training". They can be used to solve similar problems as traditional neural networks, things like image classification or extracting information from messy text, but all they require is a Zod or Pydantic type definition and a prompt. No renting GPUs, labeling data and tuning hyperparameters necessary.

They often also improve LLM performance significantly. Imagine you're trying to extract calories per 100g of product, but some product give you calories per serving and a serving size, calories per pound etc. The naive way to do this is a prompt like "give me calories per 100g", but that forces the LLM to do arithmetic, and LLMs are bad at arithmetic. With structured outputs, you just give it the fifteen different formats that you expect to see as alternatives, and use some simple Python to turn them all into calories per 100g on the backend side.

abdullin · 3h ago

Even more than that. With Structured Outputs we essentially control layout of the response, so we can force LLM to go through different parts of the completion in a predefined order.

One way teams exploit that - force LLM to go through a predefined task-specific checklist before answering. This custom hard-coded chain of thought boosts the accuracy and makes reasoning more auditable.

radicalbyte · 2h ago

Weights are code being replaced by data; something I've been making heavy use of since the early 00s. After coding for 10 years you start to see the benefits of it and understand where you should use it.

LLMs give us another tool only this time it's far more accessible and powerful.

practal · 5h ago

Great talk, thanks for putting it online so quickly. I liked the idea of making the generation / verification loop go brrr, and one way to do this is to make verification not just a human task, but a machine task, where possible.

Yes, I am talking about formal verification, of course!

That also goes nicely together with "keeping the AI on a tight leash". It seems to clash though with "English is the new programming language". So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English? I think that is possible, if you have a formal language and logic that is flexible enough, and close enough to informal English.

Yes, I am talking about abstraction logic [1], of course :-)

So the goal would be to have English (German, ...) as the ONLY programming language, invisibly backed underneath by abstraction logic.

[1] http://abstractionlogic.com

AdieuToLogic · 4h ago

> So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English?

The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.

Much like how a property in an API initially defined as being optional cannot be made mandatory without potentially breaking clients, whereas making a mandatory property optional can be backward compatible. IOW, the cardinality of "0 .. 1" is a strict superset of "1".

practal · 2h ago

> The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.

Both directions are difficult and important. How do you determine when going from formal to informal that you got the right informal statement? If you can judge that, then you can also judge if a formal statement properly represents an informal one, or if there is a problem somewhere. If you detect a discrepancy, tell the user that their English is ambiguous and that they should be more specific.

singularity2001 · 3h ago

lean 4/5 will be a rising star!

practal · 2h ago

You would definitely think so, Lean is in a great position here!

I am betting though that type theory is not the right logic for this, and that Lean can be leapfrogged.

gylterud · 2h ago

I think type theory is exactly right for this! Being so similar to programming languages, it can piggy back on the huge amount of training the LLMs have on source code.

I am not sure lean in part is the right language, there might be challengers rising (or old incumbents like Agda or Roq can find a boost). But type theory definitely has the most robust formal systems at the moment.

practal · 1h ago

> Being so similar to programming languages

I think it is more important to be close to English than to programming languages, because that is the critical part:

"As close to a programming language as necessary, as close to English as possible"

is the goal, in my opinion, without sacrificing constraints such as simplicity.

hgl · 5h ago

It’s fascinating to think about what true GUI for LLM could be like.

It immediately makes me think a LLM that can generate a customized GUI for the topic at hand where you can interact with in a non-linear way.

karpathy · 5h ago

Fun demo of an early idea was posted by Oriol just yesterday :)

https://x.com/OriolVinyalsML/status/1935005985070084197

superfrank · 2h ago

On one hand, I'm incredibly impressed by the technology behind that demo. On the other hand, I can't think of many things that would piss me off more than a non-deterministic operating system.

I like my tools to be predictable. Google search trying to predict that I want the image or shopping tag based on my query already drives me crazy. If my entire operating system did that, I'm pretty sure I'd throw my computer out a window.

hackernewds · 3h ago

it's impressive but it seems like a crappier UX? that none of the patterns can really be memorized

sensanaty · 2h ago

Ah yes, my operating system, most definitely a place I want to stick the Hallucinotron-3000 so that every click I make yields a completely different UI that has absolutely 0 bearing to reality. We're truly entering the "Software 3.0" days (can't wait for the imbeciles shoving AI everywhere to start overusing that dogshit, made-up marketing term incessantly)

danielbln · 2h ago

Maybe we can collect all of this salt and operate a Thorium reactor with it, this in turn can then power AI.

sensanaty · 2h ago

We'll need to boil a few more lakes before we get to that stage I'm afraid, who needs water when you can have your AI hallucinate some for you after all?

suddenlybananas · 3h ago

Having different documents come up every time you go into the documents directory seems hellishly terrible.

falcor84 · 1h ago

It's a brand of terribleness I've somewhat gotten used to, opening Google Drive every time, when it takes me to the "Suggested" tab. I can't recall a single time when it had the document I care about anywhere close to the top.

There's still nothing that beats the UX of Norton Commander.

aprilthird2021 · 3h ago

This is crazy cool, even if not necessarily the best use case for this idea

jonny_eh · 3h ago

An ever-shifting UI sounds unlearnable, and therefore unusable.

OtherShrezzing · 2h ago

A mixed ever-shifting UI can be excellent though. So you've got some tools which consistently interact with UI components, but the UI itself is altered frequently.

Take for example world-building video games like Cities Skylines / Sim City or procedural sandboxes like Minecraft. There are 20-30 consistent buttons (tools) in the game's UX, while the rest of the game is an unbounded ever-shifting UI.

dang · 3h ago

It wouldn't be unlearnable if it fits the way the user is already thinking.

guappa · 55m ago

AI is not mind reading.

stoisesky · 51m ago

This talk https://www.youtube.com/watch?v=MbWgRuM-7X8 explores the idea of generative / malleable personal user interfaces where LLMs can serve as the gateway to program how we want our UI to be rendered.

cjcenizal · 5h ago

My friend Eric Pelz started a company called Malleable to do this very thing: https://www.linkedin.com/posts/epelz_every-piece-of-software...

nbbaier · 5h ago

I love this concept and would love to know where to look for people working on this type of thing!

semi-extrinsic · 1h ago

Humans are shit at interacting with systems in a non-linear way. Just look at Jupyter notebooks and the absolute mess that arises when you execute code blocks in arbitrary order.

dpkirchner · 5h ago

Like a HyperCard application?

necrodome · 4h ago

We (https://vibes.diy/) are betting on this

wjohn · 7h ago

The comparison of our current methods of interacting with LLMs (back and forth text) to old-school terminals is pretty interesting. I think there's still a lot work to be done to optimize how we interact with these models, especially for non-dev consumers.

informal007 · 45s ago

Audio maybe the better option.

blobbers · 2h ago

Software 3.0 is the code generated by the machine, not the prompts that generated it. The prompts don't even yield the same output; there is randomness.

The new software world is the massive amount of code that will be burped out by these agents, and it should quickly dwarf the human output.

pelagicAustral · 2h ago

I think that if you give the same task to three different developers you'll get three different implementations. It's not a random result if you do get the functionality that was expected, and at that, I do think the prompt plays an important role in offering a view of how the result was achieved.

amai · 2h ago

The quite good blog post mentioned by Karpathy for working with LLMs when building software:

- https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/

No comments yet

nilirl · 5h ago

Where do these analogies break down?

1. Similar cost structure to electricity, but non-essential utility (currently)?

2. Like an operating system, but with non-determinism?

3. Like programming, but ...?

Where does the programming analogy break down?

rudedogg · 5h ago

> programming

The programming analogy is convenient but off. The joke has always been “the computer only does exactly what you tell it to do!” regarding logic bugs. Prompts and LLMs most certainly do not work like that.

I loved the parallels with modern LLMs and time sharing he presented though.

politelemon · 2h ago

only in English, and also non-deterministic.

malux85 · 2h ago

Yeah, wherever possible I try to have the llm answer me in Python rather than English (especially when explaining new concepts)

English is soooooo ambiguous

falcor84 · 1h ago

For what it's worth, I've been using it to help me learn math, and I added to my rules an instruction that it should always give me an example in Python (preferably sympy) whenever possible.

anythingworks · 8h ago

loved the analogies! Karpathy is consistently one of the clearest thinkers out there.

interesting that Waymo could do uninterrupted trips back in 2013, wonder what took them so long to expand? regulation? tailend of driving optimization issues?

noticed one of the slides had a cross over 'AGI 2027'... ai-2027.com :)

AlotOfReading · 8h ago

You don't "solve" autonomous driving as such. There's a long, slow grind of gradually improving things until failures become rare enough.

petesergeant · 7h ago

I wonder at what point all the self-driving code becomes replaceable with a multimodal generalist model with the prompt “drive safely”

anon7000 · 6h ago

Very advanced machine learning models are used in current self driving cars. It all depends what the model is trying to accomplish. I have a hard time seeing a generalist prompt-based generative model ever beating a model specifically designed to drive cars. The models are just designed for different, specific purposes

tshaddox · 6h ago

I could see it being the case that driving is a fairly general problem, and this models intentionally designed to be general end up doing better than models designed with the misconception that you need a very particular set of driving-specific capabilities.

shakna · 3h ago

Driving is not a general problem, though. Its a contextual landscape of fast-based reactions and predictions. Both are required, and done regularly by the human element. The exact nature of every reaction, and every prediction, change vastly within the context window.

You need image processing just as much as you need scenario management, and they're orthoganol to each other, as one example.

If you want a general transport system... We do have that. It's called rail. (And can and has been automated.)

melvinmelih · 3h ago

> Driving is not a general problem, though.

But what's driving a car? A generalist human brain that has been trained for ~30 hours to drive a car.

shakna · 1h ago

Human brain's aren't generalist!

We have multiple parts of the brain that interact in vastly different ways! Your cerebellum won't be running the role of the pons.

Most parts of the brain cannot take over for others. Self-healing is the exception, not the rule. Yes, we have a degree of neuroplasticity, but there are many limits.

(Sidenote: Driver's license here is 240 hours.)

anythingworks · 5h ago

exactly! I think that was tesla's vision with self-driving to begin with... so they tried to frame it as problem general enough, that trying to solve it would also solve questions of more general intelligence ('agi') i.e. cars should use vision just like humans would

but in hindsight looks like this slowed them down quite a bit despite being early to the space...

yokto · 2h ago

This is (in part) what "world models" are about. While some companies like Tesla bring together a fleet of small specialised models, others like CommaAI and Wayve train generalist models.

AlotOfReading · 7h ago

One of the issues with deploying models like that is the lack of clear, widely accepted ways to validate comprehensive safety and absence of unreasonable risk. If that can be solved, or regulators start accepting answers like "our software doesn't speed in over 95% of situations", then they'll become more common.

ActorNightly · 5h ago

> Karpathy is consistently one of the clearest thinkers out there.

Eh, he ran Teslas self driving division and put them into a direction that is never going to fully work.

What they should have done is a) trained a neural net to represent sequence of frames into a physical environment, and b)leveraged Mu Zero, so that self driving system basically builds out parallel simulations into the future, and does a search on the best course of action to take.

Because thats pretty much what makes humans great drivers. We don't need to know what a cone is - we internally compute that something that is an object on the road that we are driving towards is going to result in a negative outcome when we collide with it.

AlotOfReading · 4h ago

Aren't continuous, stochastic, partial knowledge environments where you need long horizon planning with strict deadlines and limited compute exactly the sort of environments muzero variants struggle with? Because that's driving.

It's also worth mentioning that humans intentionally (and safely) drive into "solid" objects all the time. Bags, steam, shadows, small animals, etc. We also break rules (e.g. drive on the wrong side of the road), and anticipate things we can't even see based on a theory of mind of other agents. Human driving is extremely sophisticated, not reducible to rules that are easily expressed in "simple" language.

visarga · 5h ago

> We don't need to know what a cone is

The counter argument is that you can't zoom in and fix a specific bug in this mode of operation. Everything is mashed together in the same neural net process. They needed to ensure safety, so testing was crucial. It is harder to test an end-to-end system than its individual parts.

suddenlybananas · 3h ago

That's absolutely not what makes humans great drivers?

tayo42 · 4h ago

Is that the approach that waymo uses?

mikewarot · 5h ago

A few days ago, I was introduced to the idea that when you're vibe coding, you're consulting a "genie", much like in the fables, you almost never get what you asked for, but if your wishes are small, you might just get what you want.

The primagen reviewed this article[1] a few days ago, and (I think) that's where I heard about it. (Can't re-watch it now, it's members only) 8(

[1] https://medium.com/@drewwww/the-gambler-and-the-genie-08491d...

fudged71 · 5h ago

“You are an expert 10x software developer. Make me a billion dollar app.” Yeah this checks out

anythingworks · 5h ago

that's a really good analogy! It feels like wicked joke that llms behave in such a way that they're both intelligent and stupid at the same time

imiric · 31m ago

The slide at 13m claims that LLMs flip the script on technology diffusion and give power to the people. Nothing could be further from the truth.

Large corporations, which have become governments in all but name, are the only ones with the capability to create ML models of any real value. They're the only ones with access to vast amounts of information and resources to train the models. They introduce biases into the models, whether deliberately or not, that reinforces their own agenda. This means that the models will either avoid or promote certain topics. It doesn't take a genius to imagine what will happen when the advertising industry inevitably extends its reach into AI companies, if it hasn't already.

Even open weights models which technically users can self-host are opaque blobs of data that only large companies can create, and have the same biases. Even most truly open source models are useless since no individual has access to the same large datasets that corporations use for training.

So, no, LLMs are the same as any other technology, and actually make governments and corporations even more powerful than anything that came before. The users benefit tangentially, if at all, but will mostly be exploited as usual. Though it's unsurprising that someone deeply embedded in the AI industry would claim otherwise.

moffkalast · 21m ago

Well there are cases like OLMo where the process, dataset, and model are all open source. As expected though, it doesn't really compare well to the worst closed model since the dataset can't contain vast amounts of stolen copyrighted data that noticeably improves the model. Llama is not good because Meta knows what they're doing, it's good because it was pretrained on the entirety of Anna's Archive and every pirated ebook they could get their hands on. Same goes for Elevenlabs and pirated audiobooks.

Lack of compute on the Ai2's side also means the context OLMo is trained for is miniscule, the other thing that you need to throw billions of dollars at to make model that's maybe useful in the end if you're very lucky. Training needs high GPU interconnect bandwidth, it can't be done in distributed horde in any meaningful way even if people wanted to.

The only ones who have the power now are the Chinese, since they can easily ignore copyright for datasets, patents for compute, and have infinite state funding.

dang · 7h ago

This was my favorite talk at AISUS because it was so full of concrete insights I hadn't heard before and (even better) practical points about what to build now, in the immediate future. (To mention just one example: the "autonomy slider".)

If it were up to me, which it very much is not, I would try to optimize the next AISUS for more of this. I felt like I was getting smarter as the talk went on.

sothatsit · 5h ago

I find Karpathy's focus on tightening the feedback loop between LLMs and humans interesting, because I've found I am the happiest when I extend the loop instead.

When I have tried to "pair program" with an LLM, I have found it incredibly tedious, and not that useful. The insights it gives me are not that great if I'm optimising for response speed, and it just frustrates me rather than letting me go faster. Worse, often my brain just turns off while waiting for the LLM to respond.

OTOH, when I work in a more async fashion, it feels freeing to just pass a problem to the AI. Then, I can stop thinking about it and work on something else. Later, I can come back to find the AI results, and I can proceed to adjust the prompt and re-generate, to slightly modify what the LLM produced, or sometimes to just accept its changes verbatim. I really like this process.

geeunits · 4h ago

I would venture that 'tightening the feedback loop' isn't necessarily 'increasing the number of back and forth prompts'- and what you're saying you want is ultimately his argument. i.e. if integral enough it can almost guess what you're going to say next...

sothatsit · 4h ago

I specifically do not want AI as an auto-correct, doing auto-predictions while I am typing. I find this interrupts my thinking process, and I've never been bottlenecked by typing speed anyway.

I want AI as a "co-worker" providing an alternative perspective or implementing my specific instructions, and potentially filling in gaps I didn't think about in my prompt.

jwblackwell · 4h ago

Yeah I am currently enjoying giving the LLM relatively small chunks of code to write and then asking it to write accompanying tests. While I focus on testing the product myself. I then don't even bother to read the code it's written most of the time

nottorp · 3h ago

In the era of AI and illiteracy...

nico · 8h ago

Thank you YC for posting this before the talk became deprecated[1]

1: https://x.com/karpathy/status/1935077692258558443

sandslash · 8h ago

We couldn't let that happen!

romain_batlle · 1h ago

Can't believe they wanted to postpone this video by a few weeks

pera · 1h ago

Is it possible to vibe code NFT smart contracts with Software 3.0?

belter · 3h ago

Painful to watch. The new tech generation deserves better than hyped presentations from tech evangelists.

This reminds me of the Three Amigos and Grady Booch evangelizing the future of software while ignoring the terrible output from Rational Software and the Unified Process.

At least we got acknowledgment that self-driving remains unsolved: https://youtu.be/LCEmiRjPEtQ?t=1622

And Waymo still requires extensive human intervention. Given Tesla's robotaxi timeline, this should crash their stock valuation...but likely won't.

You can't discuss "vibe coding" without addressing security implications of the produced artifacts, or the fact that you're building on potentially stolen code, books, and copyrighted training data.

And what exactly is Software 3.0? It was mentioned early then lost in discussions about making content "easier for agents."

benob · 2h ago

You can generate 1.0 programs with 3.0 programs. But can you generate 2.0 programs the same way?

olmo23 · 2h ago

2.0 programs (model weights) are created by running 1.0 programs (training runs).

I don't think it's currently possible to ask a model to generate the weights for a model.

movedx01 · 44m ago

But you can generate synthetic data using a 3.0 program to train a smaller, faster, cheaper-to-run 2.0 program.

iLoveOncall · 49m ago

He sounds like Terrence Howard with his nonsense.

politelemon · 2h ago

The beginning was painful to watch as is the cheering in this comment section.

The 1.0, 2.0, and 3.0 simply aren't making sense. They imply a kind of a succession and replacement and demonstrate a lack of how programming works. It sounds as marketing oriented as "Web 3.0" that has been born inside an echo chamber. And yet halfway through, the need for determinism/validation is now being reinvented.

The analogies make use of cherry picked properties, which could apply to anything.

mentalgear · 44m ago

The whole AI scene is starting to feel a lot like the cryptocurrency bubble before it burst. Don’t get me wrong, there’s real value in the field, but the hype, the influencers, and the flashy “salon tricks” are starting to drown out meaningful ML research (like Apple's critical research that actually improves AI robustness). It’s frustrating to see solid work being sidelined or even mocked in favor of vibe-coding.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

monsieurbanana · 1h ago

> "Because they all have slight pros and cons, and you may want to program some functionality in 1.0 or 2.0, or 3.0, or you're going to train in LLM, or you're going to just run from LLM"

He doesn't say they will fully replace each other (or had fully replaced each other, since his definition of 2.0 is quite old by now)

whiplash451 · 1h ago

I think Andrej is trying to elevate the conversation in an interesting way.

That in and on itself makes it worth it.

No one has a crystal clear view of what is happening, but at least he is bringing a novel and interesting perspective to the field.

amelius · 1h ago

The version numbers mean abrupt changes.

Analogy: how we "moved" from using Google to ChatGPT is an abrupt change, and we still use Google.

nodesocket · 7h ago

llms.txt makes a lot of sense, especially for LLMs to interact with http APIs autonomously.

Seems like you could set a LLM loose and like the Google Bot have it start converting all html pages into llms.txt. Man, the future is crazy.

nothrabannosir · 6h ago

Couldn’t believe my eyes. The www is truly bankrupt. If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Website too confusing for humans? Add more design, modals, newsletter pop ups, cookie banners, ads, …

Website too confusing for LLMs? Add an accessible, clean, ad-free, concise, high entropy, plain text summary of your website. Make sure to hide it from the humans!

PS: it should be /.well-known/llms.txt but that feels futile at this point..

PPS: I enjoyed the talk, thanks.

andrethegiant · 6h ago

> If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Not a browser plugin, but you can prefix URLs with `pure.md/` to get the pure markdown of that page. It's not quite a 1:1 to llms.txt as it doesn't explain the entire domain, but works well for one-off pages. [disclaimer: I'm the maintainer]

jph00 · 4h ago

The next version of the llms.txt proposal will allow an llms.txt file to be added at any level of a path, which isn't compatible with /.well-known.

(I'm the creator of the llms.txt proposal.)

nothrabannosir · 4h ago

[flagged]

dang · 3h ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

nothrabannosir · 1h ago

Fair

alightsoul · 3h ago

The web started dying with mobile social media apps, in which hyperlinks are a poor UX choice. Then again with SEO banning outlinks. Now this. The web of interconnected pages that was the World Wide Web is dead. Not on social media? No one sees you. Run a website? more bots than humans. Unless you sell something on the side with the website it's not profitable. Hyperlinking to other websites is dead.

Gen Alpha doesn't know what a web page is and if they do, it's for stuff like neocities aka as a curiosity or art form only. Not as a source of information anymore. I don't blame them. Apps (social media apps) have less friction than web sites but have a higher barrier for people to create. We are going back to pre World Wide Web days in a way, kind of like Bulletin Board Systems on dial up without hyperlinking, and centralized (social media) Some countries mostly ones with few technical people llike the ones in Central America have moved away from the web almost entirely and into social media like Instagram.

Due to the death of the web, google search and friends now rely mostly on matching queries with titles now so just like before the internet you have to know people to learn new stuff or wait for an algorithm to show it to you or someone to comment it online or forcefully enroll in a university. Maybe that's why search results have declined and poeple search using ChatGPT or maybe perplexity. Scholarly search engines are a bit better but frankly irrelevant for most poeple.

Now I understand why Google established their own DNS server at 8.8.8.8. If you have a directory of all domains on DNS, you can still index sites without hyperlinks between them, even if the web dies. They saw it coming.

practal · 5h ago

If you have different representations of the same thing (llms.txt / HTML), how do you know it is actually equivalent to each other? I am wondering if there are scenarios where webpage publishers would be interested in gaming this.

jph00 · 4h ago

That's not what llms.txt is. You can just use a regular markdown URL or similar for that.

llms.txt is a description for an LLM of how to find the information on your site needed for an LLM to use your product or service effectively.

andrethegiant · 5h ago

bedit · 5h ago

I love the "people spirits" analogy. For casual tasks like vibecoding or boiling an egg, LLM errors aren't a big deal. But for critical work, we need rigorous checks—just like we do with human reasoning. That's the core of empirical science: we expect fallibility, so we verify. A great example is how early migration theories based on pottery were revised with better data like ancient DNA (see David Reich). Letting LLMs judge each other without solid external checks misses the point—leaderboard-style human rankings are often just as flawed.

dmitrijbelikov · 4h ago

I think that Andrej presents “Software 3.0” as a revolution, but in essence it is a natural evolution of abstractions.

Abstractions don't eliminate the need to understand the underlying layers - they just hide them until something goes wrong.

Software 3.0 is a step forward in convenience. But it is not a replacement for developers with a foundation, but a tool for acceleration, amplification and scaling.

If you know what is under the hood — you are irreplaceable. If you do not know — you become dependent on a tool that you do not always understand.

alightsoul · 3h ago

why does vibe coding still involve any code at all? why can't an AI directly control the registers of a computer processor and graphics card, controlling a computer directly? why can't it draw on the screen directly, connected directly to the rows and columns of an LCD screen? what if an AI agent was implemented in hardware, with a processor for AI, a normal computer processor for logic, and a processor that correlates UI elements to touches on the screen? and a network card, some RAM for temporary stuff like UI elements and some persistent storage for vectors that represent UI elements and past converstations

flumpcakes · 3h ago

I'm not sure this makes sense as a question. Registers are 'controlled' by running code for a given state. An AI can write code that changes registers, as all code does in operation. An AI can't directly 'control registers' in any other way, just as you or I can't.

alightsoul · 3h ago

I would like to make an AI agent that directly interfaces with a processor by setting bits in a processor register, thus eliminating the need for even assembly code or any kind of code. The only software you would ever need would be the AI.

shakna · 3h ago

That's called a JIT compiler. And ignoring how bad an idea blending those two... It wouldn't be that difficult a task.

The hardest parts of a jit is the safety aspect. And AI already violates most of that.

alightsoul · 3h ago

The safety part will probably be either solved or a non-issue or ignored. Similarly to how GPT3 was often seen as dangerous before ChatGPT was released. Some people who have only ever vibe coded are finding jobs today, ignoring safety entirely and lacking a notion of it or what it means. They just copy paste output from ChatGPT or an agentic IDE. To me it's JIT already with extra steps. Or they have pivoted their software engineers to vibe coding most of the time and don't even touch code anymore doing JIT with extra steps again.

shakna · 1h ago

As "jit" to you means running code, and not "building and executing machine code", maybe you could vibe code this. And enjoy the segfaults.

guappa · 41m ago

In a way he's making sense. If the "code" is the prompt, the output of the llm is an intermediate artifact, like the intermediate steps of gcc.

So why should we still need gcc?

The answer is of course, that we need it because llm's output is shit 90% of the time and debugging assembly or binary directly is even harder, so putting asides the difficulties of training the model, the output would be unusable.

shakna · 19m ago

Probably too much snark from me. But the gulf between interpreter and compiler can be decades of work, often discovering new mathematical principles along the way.

The idea that you're fine to risk everything, in the way agentic things allow [0], and want that messing around with raw memory is... A return to DOS' crashes, but with HAL along for the ride.

[0] https://msrc.microsoft.com/update-guide/vulnerability/CVE-20...

singularity2001 · 3h ago

what he means is why are the tokens not directly machine code tokens

birn559 · 3h ago

Because any precise description of what the computer is supposed to do is already code as we know it. AI can fill in the gaps between natural language and programming by guessing and because you don't always care about the "how" only about the "what". The more you care about the "how" you have to become more precise in your language to reduce the guess work of the AI to the point that your input to the AI is already code.

The question is: how much do we really care about the "how", even when we think we care about it? Modern programming language don't do guessing work, but they already abstract away quite a lot of the "how".

I believe that's the original argument in favor of coding in assembler and that it will stay relevant.

Following this argument, what AI is really missing is determinism to a far extend. I can't just save my input I have given to an AI and can be sure that it will produce the exact same output in a year from now on.

alightsoul · 3h ago

With vibe coding, I am under the impression that the only thing that matters for vibe coders is whether the output is good enough in the moment to fullfill a desire. For companies going AI first that's how it seems to be done. I see people in other places and those people have lost interest in the "how"

abhaynayar · 3h ago

Nice try, AI.

therein · 3h ago

All you need is a framebuffer and AI.

AIorNot · 8h ago

Love his analogies and clear eyed picture

pyman · 8h ago

"We're not building Iron Man robots. We're building Iron Man suits"

pryelluw · 7h ago

Funny thing is that in more than one of the iron man movies the suits end up being bad robots. Even the ai iron man made shows up to ruin the day in the avengers movie. So it’s a little in the nose that they’d try to pitch it this way.

wiseowise · 3h ago

That’s looking too much into this. It’s just an obvious plot twist to justify making another movie, nothing else.

reducesuffering · 8h ago

[flagged]

throwawayoldie · 7h ago

I'm old enough to remember when Twitter was new, and for a moment it felt like the old utopian promise of the Internet finally fulfilled: ordinary people would be able to talk, one-on-one and unmediated, with other ordinary people across the world, and in the process we'd find out that we're all more similar than different and mainly want the same things out of life, leading to a new era of peace and empathy.

It was a nice feeling while it lasted.

tock · 6h ago

I believe the opposite happened. People found out that there are huge groups of people with wildly differing views on morality from them and that just encouraged more hate. I genuinely think old school facebook where people only interacted with their own private friend circles is better.

prisenco · 3h ago

Broadcast networks like Twitter only make sense for influencers, celebrities and people building a brand. They're a net negative for literally anyone else.

| old school facebook where people only interacted with their own private friend circles is better.

100% agree but crazy that option doesn't exist anymore.

_kb · 7h ago

Believe it or not, humans did in fact have forms of written language and communication prior to twitter.

dang · 7h ago

Can you please make your substantive points without snark? We're trying for something a bit different here.

https://news.ycombinator.com/newsguidelines.html

throwawayoldie · 6h ago

You missed the point, but that's fine, it happens.

ast0708 · 4h ago

Should we not treat LLMs more as a UX feature to interact with a domain specific model (highly contextual), rather than expecting LLMs to provide the intelligence needed for software to act as partner to Humans.

guappa · 37m ago

He's selling something.

paganel · 2h ago

I guess Karpathy won't ever become a multi-millionare/billionaire, seeing as he's now at the stage of presenting TedX-like thingies.

That also probably shows that he's out of the loop when it comes to the present work now done in "AI", because had he been there he wouldn't have had time for this kind of fluffy presentations.

fnord77 · 5h ago

Him claiming govts don't use AI or are behind the curve is not accurate.

Modern military drones are very much AI agents

No comments yet

moralestapia · 5h ago

[flagged]

dang · 3h ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

"Don't be snarky."

https://news.ycombinator.com/newsguidelines.html

moralestapia · 3h ago

Wait ... but this is true.

Maybe I missed a source but I assumed it was somehow common knowledge.

https://en.m.wikipedia.org/wiki/List_of_Tesla_Autopilot_cras...

William_BB · 1h ago

[flagged]

AdieuToLogic · 7h ago

It's an interesting presentation, no doubt. The analogies eventually fail as analogies usually do.

A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

Also, the OS analogy doesn't make sense to me. Perhaps this is because I do not subscribe to LLM's having reasoning capabilities nor able to reliably provide services an OS-like system can be shown to provide.

A minor critique regarding the analogy equating LLM's to mainframes:

  Mainframes in the 1960's never "ran in the cloud" as it did
  not exist.  They still do not "run in the cloud" unless one
  includes simulators.

  Terminals in the 1960's - 1980's did not use networks.  They
  used dedicated serial cables or dial-up modems to connect
  either directly or through stat-mux concentrators.

  "Compute" was not "batched over users."  Mainframes either
  had jobs submitted and ran via operators (indirect execution)
  or supported multi-user time slicing (such as found in Unix).

distalx · 6h ago

Hang in there! Your comment makes some really good points about the limits of analogies and the real control corporations have over LLMs.

Plus, your historical corrections were spot on. Sometimes, good criticisms just get lost in the noise online. Don't let it get to you!

furyofantares · 7h ago

> The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

I don't think that's what he said, he was identifying the first customers and uses.

AdieuToLogic · 6h ago

>> A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

> I don't think that's what he said, he was identifying the first customers and uses.

The portion of the presentation I am referencing starts at or near 12:50[0]. Here is what was said:

  I wrote about this one particular property that strikes me
  as very different this time around.  It's that LLM's like
  flip they flip the direction of technology diffusion that
  is usually present in technology.

  So for example with electricity, cryptography, computing,
  flight, internet, GPS, lots of new transformative that have
  not been around.

  Typically it is the government and corporations that are
  the first users because it's new expensive etc. and it only
  later diffuses to consumer.  But I feel like LLM's are kind
  of like flipped around.

  So maybe with early computers it was all about ballistics
  and military use, but with LLM's it's all about how do you
  boil an egg or something like that.  This is certainly like
  a lot of my use.  And so it's really fascinating to me that
  we have a new magical computer it's like helping me boil an
  egg.

  It's not helping the government do something really crazy
  like some military ballistics or some special technology.

Note the identification of historic government interest in computing along with a flippant "regular person" scenario in the context of "technology diffusion."

You are right in that the presenter identified "first customers", but this is mentioned in passing when viewed in context. Perhaps I should not have characterized this as "a recurring theme." Instead, a better categorization might be:

  The presenter minimized the control corporations have by
  keeping focus on governmental topics and trivial customer
  use-cases.

0 - https://youtu.be/LCEmiRjPEtQ?t=770

jppope · 8h ago

Well that showed up significantly faster than they said it would.

dang · 7h ago

The team adapted quickly, which is a good sign. I believe getting the videos out sooner (as in why-not-immediately) is going to be a priority in the future.

seneca · 7h ago

Classic under promise and over deliver.

I'm glad they got it out quickly.

dang · 7h ago

Me too. It was my favorite talk of the ones I saw.

sneak · 6h ago

Can we please stop standardizing on putting things in the root?

/.well-known/ exists for this purpose.

example.com/.well-known/llms.txt

https://en.m.wikipedia.org/wiki/Well-known_URI

jph00 · 4h ago

You can't just put things there any time you want - the RFC requires that they go through a registration process.

Having said that, this won't work for llms.txt, since in the next version of the proposal they'll be allowed at any level of the path, not only the root.

politelemon · 2h ago

> You can't just put things there any time you want - the RFC requires that they go through a registration process.

Actually, I can for two reasons. First is of course the RFC mentions that items can be registered after the fact, if it's found that a particular well-known suffix is being widely used. But the second is a bit more chaotic - website owners are under no obligation to consult a registry, much like port registrations; in many cases they won't even know it exists and may think of it as a place that should reflect their mental model.

It can make things awkward and difficult though, that is true, but that comes with the free text nature of the well-known space. That's made evident in the Github issue linked, a large group of very smart people didn't know that there was a registry for it.

https://github.com/AnswerDotAI/llms-txt/issues/2#issuecommen...

jph00 · 26m ago

There was no "large group of very smart people" behind llms.txt. It was just me. And I'm very familiar with the registry, and it doesn't work for this particular case IMO (although other folks are welcome to register it if they feel otherwise, of course).

sneak · 1h ago

I put stuff in /.well-known/ all the time whenever I want. They’re my servers.

dncornholio · 2h ago

> You can't just put things there any time you want - the RFC requires that they go through a registration process.

Excuse me???

jph00 · 25m ago

From the RFC:

""" A well-known URI is a URI [RFC3986] whose path component begins with the characters "/.well-known/", and whose scheme is "HTTP", "HTTPS", or another scheme that has explicitly been specified to use well- known URIs.

Applications that wish to mint new well-known URIs MUST register them, following the procedures in Section 5.1. """

andrethegiant · 6h ago

https://github.com/AnswerDotAI/llms-txt/issues/2