Andrej Karpathy: Software in the era of AI [video]

266 sandslash 66 6/19/2025, 12:33:21 AM youtube.com ↗

Comments (66)

sothatsit · 50m ago

I find Karpathy's focus on tightening the feedback loop between LLMs and humans interesting, because I've found I am the happiest when I extend the loop instead.

When I have tried to "pair program" with an LLM, I have found it incredibly tedious, and not that useful. The insights it gives me are not that great if I'm optimising for response speed, and it just frustrates me rather than letting me go faster. Worse, often my brain just turns off while waiting for the LLM to respond.

OTOH, when I work in a more async fashion, it feels freeing to just pass a problem to the AI. Then, I can stop thinking about it and work on something else. Later, I can come back to find the AI results, and I can proceed to adjust the prompt and re-generate, to slightly modify what the LLM produced, or sometimes to just accept its changes verbatim. I really like this process.

geeunits · 19m ago

I would venture that 'tightening the feedback loop' isn't necessarily 'increasing the number of back and forth prompts'- and what you're saying you want is ultimately his argument. i.e. if integral enough it can almost guess what you're going to say next...

jwblackwell · 17m ago

Yeah I am currently enjoying giving the LLM relatively small chunks of code to write and then asking it to write accompanying tests. While I focus on testing the product myself. I then don't even bother to read the code it's written most of the time

gchamonlive · 4h ago

I think it's interesting to juxtapose traditional coding, neural network weights and prompts because in many areas -- like the example of the self driving module having code being replaced by neural networks tuned to the target dataset representing the domain -- this will be quite useful.

However I think it's important to make it clear that given the hardware constraints of many environments the applicability of what's being called software 2.0 and 3.0 will be severely limited.

So instead of being replacements, these paradigms are more like extra tools in the tool belt. Code and prompts will live side by side, being used when convenient, but none a panacea.

karpathy · 2h ago

I kind of say it in words (agreeing with you) but I agree the versioning is a bit confusing analogy because it usually additionally implies some kind of improvement. When I’m just trying to distinguish them as very different software categories.

miki123211 · 1h ago

What do you think about structured outputs / JSON mode / constrained decoding / whatever you wish to call it?

To me, it's a criminally underused tool. While "raw" LLMs are cool, they're annoying to use as anything but chatbots, as their output is unpredictable and basically impossible to parse programmatically.

Structured outputs solve that problem neatly. In a way, they're "neural networks without the training". They can be used to solve similar problems as traditional neural networks, things like image classification or extracting information from messy text, but all they require is a Zod or Pydantic type definition and a prompt. No renting GPUs, labeling data and tuning hyperparameters necessary.

They often also improve LLM performance significantly. Imagine you're trying to extract calories per 100g of product, but some product give you calories per serving and a serving size, calories per pound etc. The naive way to do this is a prompt like "give me calories per 100g", but that forces the LLM to do arithmetic, and LLMs are bad at arithmetic. With structured outputs, you just give it the fifteen different formats that you expect to see as alternatives, and use some simple Python to turn them all into calories per 100g on the backend side.

dmitrijbelikov · 25m ago

I think that Andrej presents “Software 3.0” as a revolution, but in essence it is a natural evolution of abstractions.

Abstractions don't eliminate the need to understand the underlying layers - they just hide them until something goes wrong.

Software 3.0 is a step forward in convenience. But it is not a replacement for developers with a foundation, but a tool for acceleration, amplification and scaling.

If you know what is under the hood — you are irreplaceable. If you do not know — you become dependent on a tool that you do not always understand.

wjohn · 3h ago

The comparison of our current methods of interacting with LLMs (back and forth text) to old-school terminals is pretty interesting. I think there's still a lot work to be done to optimize how we interact with these models, especially for non-dev consumers.

nilirl · 1h ago

Where do these analogies break down?

1. Similar cost structure to electricity, but non-essential utility (currently)?

2. Like an operating system, but with non-determinism?

3. Like programming, but ...?

Where does the programming analogy break down?

rudedogg · 40m ago

> programming

The programming analogy is convenient but off. The joke has always been “the computer only does exactly what you tell it to do!” regarding logic bugs. Prompts and LLMs most certainly do not work like that.

I loved the parallels with modern LLMs and time sharing he presented though.

practal · 1h ago

Great talk, thanks for putting it online so quickly. I liked the idea of making the generation / verification loop go brrr, and one way to do this is to make verification not just a human task, but a machine task, where possible.

Yes, I am talking about formal verification, of course!

That also goes nicely together with "keeping the AI on a tight leash". It seems to clash though with "English is the new programming language". So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English? I think that is possible, if you have a formal language and logic that is flexible enough, and close enough to informal English.

Yes, I am talking about abstraction logic [1], of course :-)

So the goal would be to have English (German, ...) as the ONLY programming language, invisibly backed underneath by abstraction logic.

[1] http://abstractionlogic.com

AdieuToLogic · 21m ago

> So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English?

The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.

Much like how a property in an API initially defined as being optional cannot be made mandatory without potentially breaking clients, whereas making a mandatory property optional can be backward compatible. IOW, the cardinality of "0 .. 1" is a strict superset of "1".

dang · 2h ago

This was my favorite talk at AISUS because it was so full of concrete insights I hadn't heard before and (even better) practical points about what to build now, in the immediate future. (To mention just one example: the "autonomy slider".)

If it were up to me, which it very much is not, I would try to optimize the next AISUS for more of this. I felt like I was getting smarter as the talk went on.

anythingworks · 3h ago

loved the analogies! Karpathy is consistently one of the clearest thinkers out there.

interesting that Waymo could do uninterrupted trips back in 2013, wonder what took them so long to expand? regulation? tailend of driving optimization issues?

noticed one of the slides had a cross over 'AGI 2027'... ai-2027.com :)

AlotOfReading · 3h ago

You don't "solve" autonomous driving as such. There's a long, slow grind of gradually improving things until failures become rare enough.

petesergeant · 3h ago

I wonder at what point all the self-driving code becomes replaceable with a multimodal generalist model with the prompt “drive safely”

anon7000 · 2h ago

Very advanced machine learning models are used in current self driving cars. It all depends what the model is trying to accomplish. I have a hard time seeing a generalist prompt-based generative model ever beating a model specifically designed to drive cars. The models are just designed for different, specific purposes

tshaddox · 1h ago

I could see it being the case that driving is a fairly general problem, and this models intentionally designed to be general end up doing better than models designed with the misconception that you need a very particular set of driving-specific capabilities.

anythingworks · 1h ago

exactly! I think that was tesla's vision with self-driving to begin with... so they tried to frame it as problem general enough, that trying to solve it would also solve questions of more general intelligence ('agi') i.e. cars should use vision just like humans would

but in hindsight looks like this slowed them down quite a bit despite being early to the space...

AlotOfReading · 3h ago

One of the issues with deploying models like that is the lack of clear, widely accepted ways to validate comprehensive safety and absence of unreasonable risk. If that can be solved, or regulators start accepting answers like "our software doesn't speed in over 95% of situations", then they'll become more common.

ActorNightly · 1h ago

> Karpathy is consistently one of the clearest thinkers out there.

Eh, he ran Teslas self driving division and put them into a direction that is never going to fully work.

What they should have done is a) trained a neural net to represent sequence of frames into a physical environment, and b)leveraged Mu Zero, so that self driving system basically builds out parallel simulations into the future, and does a search on the best course of action to take.

Because thats pretty much what makes humans great drivers. We don't need to know what a cone is - we internally compute that something that is an object on the road that we are driving towards is going to result in a negative outcome when we collide with it.

AlotOfReading · 24m ago

Aren't continuous, stochastic, partial knowledge environments where you need long horizon planning with strict deadlines and limited compute exactly the sort of environments muzero variants struggle with? Because that's driving.

It's also worth mentioning that humans intentionally (and safely) drive into "solid" objects all the time. Bags, steam, shadows, small animals, etc. We also break rules (e.g. drive on the wrong side of the road), and anticipate things we can't even see based on a theory of mind of other agents. Human driving is extremely sophisticated, not reducible to rules that are easily expressed in "simple" language.

visarga · 1h ago

> We don't need to know what a cone is

The counter argument is that you can't zoom in and fix a specific bug in this mode of operation. Everything is mashed together in the same neural net process. They needed to ensure safety, so testing was crucial. It is harder to test an end-to-end system than its individual parts.

tayo42 · 20m ago

Is that the approach that waymo uses?

mikewarot · 1h ago

A few days ago, I was introduced to the idea that when you're vibe coding, you're consulting a "genie", much like in the fables, you almost never get what you asked for, but if your wishes are small, you might just get what you want.

fudged71 · 1h ago

“You are an expert 10x software developer. Make me a billion dollar app.” Yeah this checks out

anythingworks · 1h ago

that's a really good analogy! It feels like wicked joke that llms behave in such a way that they're both intelligent and stupid at the same time

hgl · 1h ago

It’s fascinating to think about what true GUI for LLM could be like.

It immediately makes me think a LLM that can generate a customized GUI for the topic at hand where you can interact with in a non-linear way.

cjcenizal · 51m ago

My friend Eric Pelz started a company called Malleable to do this very thing: https://www.linkedin.com/posts/epelz_every-piece-of-software...

karpathy · 53m ago

Fun demo of an early idea was posted by Oriol just yesterday :)

https://x.com/OriolVinyalsML/status/1935005985070084197

nbbaier · 1h ago

I love this concept and would love to know where to look for people working on this type of thing!

dpkirchner · 59m ago

Like a HyperCard application?

necrodome · 20m ago

We (https://vibes.diy/) are betting on this

nico · 4h ago

Thank you YC for posting this before the talk became deprecated[1]

1: https://x.com/karpathy/status/1935077692258558443

sandslash · 3h ago

We couldn't let that happen!

bedit · 1h ago

I love the "people spirits" analogy. For casual tasks like vibecoding or boiling an egg, LLM errors aren't a big deal. But for critical work, we need rigorous checks—just like we do with human reasoning. That's the core of empirical science: we expect fallibility, so we verify. A great example is how early migration theories based on pottery were revised with better data like ancient DNA (see David Reich). Letting LLMs judge each other without solid external checks misses the point—leaderboard-style human rankings are often just as flawed.

nodesocket · 2h ago

llms.txt makes a lot of sense, especially for LLMs to interact with http APIs autonomously.

Seems like you could set a LLM loose and like the Google Bot have it start converting all html pages into llms.txt. Man, the future is crazy.

nothrabannosir · 1h ago

Couldn’t believe my eyes. The www is truly bankrupt. If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Website too confusing for humans? Add more design, modals, newsletter pop ups, cookie banners, ads, …

Website too confusing for LLMs? Add an accessible, clean, ad-free, concise, high entropy, plain text summary of your website. Make sure to hide it from the humans!

PS: it should be /.well-known/llms.txt but that feels futile at this point..

PPS: I enjoyed the talk, thanks.

andrethegiant · 1h ago

> If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Not a browser plugin, but you can prefix URLs with `pure.md/` to get the pure markdown of that page. It's not quite a 1:1 to llms.txt as it doesn't explain the entire domain, but works well for one-off pages. [disclaimer: I'm the maintainer]

jph00 · 11m ago

The next version of the llms.txt proposal will allow an llms.txt file to be added at any level of a path, which isn't compatible with /.well-known.

(I'm the creator of the llms.txt proposal.)

practal · 1h ago

If you have different representations of the same thing (llms.txt / HTML), how do you know it is actually equivalent to each other? I am wondering if there are scenarios where webpage publishers would be interested in gaming this.

jph00 · 10m ago

That's not what llms.txt is. You can just use a regular markdown URL or similar for that.

llms.txt is a description for an LLM of how to find the information on your site needed for an LLM to use your product or service effectively.

andrethegiant · 41m ago

moralestapia · 35m ago

Would have been nice to hear about why he decided to strip Teslas of all sensors that are not cameras.

Most likely criminal behavior, IMO, as it was released into the streets and innocent people have died because of it.

... but hey, the guy wrote a couple of fun tweets!

fnord77 · 1h ago

Him claiming govts don't use AI or are behind the curve is not accurate.

Modern military drones are very much AI agents

password4321 · 30m ago

The naïveté there was endearing.

Echos of the English government's discussion of RSA before it was publicly re-invented.

AIorNot · 3h ago

Love his analogies and clear eyed picture

pyman · 3h ago

"We're not building Iron Man robots. We're building Iron Man suits"

pryelluw · 3h ago

Funny thing is that in more than one of the iron man movies the suits end up being bad robots. Even the ai iron man made shows up to ruin the day in the avengers movie. So it’s a little in the nose that they’d try to pitch it this way.

reducesuffering · 3h ago

[flagged]

throwawayoldie · 3h ago

I'm old enough to remember when Twitter was new, and for a moment it felt like the old utopian promise of the Internet finally fulfilled: ordinary people would be able to talk, one-on-one and unmediated, with other ordinary people across the world, and in the process we'd find out that we're all more similar than different and mainly want the same things out of life, leading to a new era of peace and empathy.

It was a nice feeling while it lasted.

tock · 1h ago

I believe the opposite happened. People found out that there are huge groups of people with wildly differing views on morality from them and that just encouraged more hate. I genuinely think old school facebook where people only interacted with their own private friend circles is better.

_kb · 2h ago

Believe it or not, humans did in fact have forms of written language and communication prior to twitter.

dang · 2h ago

Can you please make your substantive points without snark? We're trying for something a bit different here.

https://news.ycombinator.com/newsguidelines.html

throwawayoldie · 1h ago

You missed the point, but that's fine, it happens.

AdieuToLogic · 3h ago

It's an interesting presentation, no doubt. The analogies eventually fail as analogies usually do.

A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

Also, the OS analogy doesn't make sense to me. Perhaps this is because I do not subscribe to LLM's having reasoning capabilities nor able to reliably provide services an OS-like system can be shown to provide.

A minor critique regarding the analogy equating LLM's to mainframes:

  Mainframes in the 1960's never "ran in the cloud" as it did
  not exist.  They still do not "run in the cloud" unless one
  includes simulators.

  Terminals in the 1960's - 1980's did not use networks.  They
  used dedicated serial cables or dial-up modems to connect
  either directly or through stat-mux concentrators.

  "Compute" was not "batched over users."  Mainframes either
  had jobs submitted and ran via operators (indirect execution)
  or supported multi-user time slicing (such as found in Unix).

distalx · 1h ago

Hang in there! Your comment makes some really good points about the limits of analogies and the real control corporations have over LLMs.

Plus, your historical corrections were spot on. Sometimes, good criticisms just get lost in the noise online. Don't let it get to you!

furyofantares · 2h ago

> The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

I don't think that's what he said, he was identifying the first customers and uses.

AdieuToLogic · 2h ago

>> A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

> I don't think that's what he said, he was identifying the first customers and uses.

The portion of the presentation I am referencing starts at or near 12:50[0]. Here is what was said:

  I wrote about this one particular property that strikes me
  as very different this time around.  It's that LLM's like
  flip they flip the direction of technology diffusion that
  is usually present in technology.

  So for example with electricity, cryptography, computing,
  flight, internet, GPS, lots of new transformative that have
  not been around.

  Typically it is the government and corporations that are
  the first users because it's new expensive etc. and it only
  later diffuses to consumer.  But I feel like LLM's are kind
  of like flipped around.

  So maybe with early computers it was all about ballistics
  and military use, but with LLM's it's all about how do you
  boil an egg or something like that.  This is certainly like
  a lot of my use.  And so it's really fascinating to me that
  we have a new magical computer it's like helping me boil an
  egg.

  It's not helping the government do something really crazy
  like some military ballistics or some special technology.

Note the identification of historic government interest in computing along with a flippant "regular person" scenario in the context of "technology diffusion."

You are right in that the presenter identified "first customers", but this is mentioned in passing when viewed in context. Perhaps I should not have characterized this as "a recurring theme." Instead, a better categorization might be:

  The presenter minimized the control corporations have by
  keeping focus on governmental topics and trivial customer
  use-cases.

0 - https://youtu.be/LCEmiRjPEtQ?t=770

jppope · 4h ago

Well that showed up significantly faster than they said it would.

dang · 2h ago

The team adapted quickly, which is a good sign. I believe getting the videos out sooner (as in why-not-immediately) is going to be a priority in the future.

seneca · 2h ago

Classic under promise and over deliver.

I'm glad they got it out quickly.

dang · 2h ago

Me too. It was my favorite talk of the ones I saw.

sneak · 2h ago

Can we please stop standardizing on putting things in the root?

/.well-known/ exists for this purpose.

example.com/.well-known/llms.txt

https://en.m.wikipedia.org/wiki/Well-known_URI

jph00 · 9m ago

You can't just put things there any time you want - the RFC requires that they go through a registration process.

Having said that, this won't work for llms.txt, since in the next version of the proposal they'll be allowed at any level of the path, not only the root.

andrethegiant · 1h ago

https://github.com/AnswerDotAI/llms-txt/issues/2

Andrej Karpathy: Software in the era of AI [video] (youtube.com)

The Zed Debugger Is Here (zed.dev)

TI to invest $60B to manufacture foundational semiconductors in the U.S. (ti.com)

Show HN: Unregistry – “docker push” directly to servers without a registry (github.com)

Elliptic Curves as Art (elliptic-curves.art)

MCP Specification – version 2025-06-18 changes (modelcontextprotocol.io)

Websites are tracking you via browser fingerprinting (engineering.tamu.edu)

Polyhedra Viewer (polyhedra.tessera.li)

Show HN: Workout.cool – Open-source fitness coaching platform (github.com)

My iPhone 8 Refuses to Die: Now It's a Solar-Powered Vision OCR Server (terminalbytes.com)

The Missing 11th of the Month (drhagen.com)

Bento: A Steam Deck in a Keyboard (github.com)

Dr. Demento Announces Retirement After 55-Year Radio Career (sopghreporter.com)

Fang, the CLI Starter Kit (github.com)

The unreasonable effectiveness of fuzzing for porting programs (rjp.io)

Homomorphically Encrypting CRDTs (jakelazaroff.com)

The Matrix (1999) Filming Locations – Shot-for-Shot – Sydney, Australia [video] (youtube.com)

Writing documentation for AI: best practices (docs.kapa.ai)

Poline – An enigmatic color palette generator using polar coordinates (meodai.github.io)

Citizen science illuminates the nature of city lights (nature.com)

Show HN: VS Code extension to share code snippets instantly (snippetshare.dev)

Visual History of the Latin Alphabet (uclab.fh-potsdam.de)

New US visa rules will force foreign students to unlock social media profiles (theguardian.com)

A deep-dive explainer on Ink and Switch's BeeKEM protocol (meri.garden)

SpaceX Starship 36 Anomaly (twitter.com)

Show HN: I built a tensor library from scratch in C++/CUDA (github.com)

Game Hacking – Valve Anti-Cheat (VAC) (codeneverdies.github.io)

Yes I Will Read Ulysses Yes (theatlantic.com)

Revisiting Minsky's Society of Mind in 2025 (suthakamal.substack.com)

USDA Pomological Watercolors (search.nal.usda.gov)

Attimet (YC F24) – Quant Trading Research Lab – Is Hiring Founding Engineer (ycombinator.com)

DropZap World – My falling block game with lasers, released after years of work (apps.apple.com)

It's true, “we” don't care about accessibility on Linux (tesk.page)

I feel open source has turned into two worlds (utcc.utoronto.ca)

Framework Laptop 12 review (arstechnica.com)

Toxic Proteins for Drug Discovery (asimov.press)

Real-time action chunking with large models (pi.website)

Honda conducts successful launch and landing of experimental reusable rocket (global.honda)

Introduction to the A* Algorithm (2014) (redblobgames.com)

Calling Go from Elixir with a CNode in Crystal (relistan.com)

Reasoning by Superposition: A Perspective on Chain of Continuous Thought (arxiv.org)

Show HN: Trieve CLI – Terminal-based LLM agent loop with search tool for PDFs (github.com)

MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model (github.com)

Show HN: Free local security checks for AI coding in VSCode, Cursor and Windsurf

PWM flicker: Invisible light that's harming our health? (caseorganic.medium.com)

Building agents using streaming SQL queries (morling.dev)

An injectable HIV-prevention drug is highly effective – but expensive (nbcnews.com)

After millions of years, why are carnivorous plants still so small? (smithsonianmag.com)

Terpstra Keyboard (terpstrakeyboard.com)

Is there a half-life for the success rates of AI agents? (tobyord.com)

Andrej Karpathy: Software in the era of AI [video]

Comments (66)