Kernel-hack-drill and a new approach to exploiting CVE-2024-50264 in the Linux (a13xp0p0v.github.io)

Fwiw nothing beats ‘implement the game logic in full (huge amounts of work) and with pruning on some heuristics look 50 moves ahead’. This is how chess engines work and how all good turn based game ai works.

I’ve tried throwing masses of game state data at latest models in pytorch. Unusable. It Makes really dumb moves. In fact one big issue is that it often suggests invalid moves and the best way to avoid this is to implement the board game logic in full to validate it. At which point, why don’t i just do the above scan ahead X moves since i have to do the hard parts of manually building the world model anyway?

One area where current ai is helping is on the heuristics themselves for evaluating best moves when scanning ahead. You can input various game states and whether the player won the game or not in the end to train the values of the heuristics. You still need to implement the world model and look ahead to use those heuristics though! When you hear of neural networks being used for go or chess this is where they are used. You still need to build the world model and brute force scan ahead.

One path i do want to try more: In theory coding assistants should be able to read rulebooks and dynamically generate code to represent those rules. If you can do that part the rest should be easy. Ie. it could be possible to throw rulebooks at ai and it play the game. It would generate a world model from the rulebook via coding assistants and scan ahead more moves than humanly possible using that world model, evaluating to some heuristics that would need to be trained through trial and error.

Of course coding assistants aren’t at a point where you can throw rulebooks at them to generate an internal representation of game states. I should know. I just spent weeks building the game model even with a coding assistant.

smokel · 3h ago

You probably know this, but things heavily depend on the type of board game you are trying to solve.

In Go, for instance, it does not help much to look 50 moves ahead. The complexity is way too high for this to be feasible, and determining who's ahead is far from trivial. It's in these situations where modern AI (reinforcement learning, deep neural networks) helps tremendously.

Also note that nobody said that using AI is easy.

AnotherGoodName · 3h ago

Alphago (and stockfish that another commenter mentioned) still has to search ahead using a world model. The AI training just helps with the heuristics for pruning and evaluation of that search.

The big fundamental blocker to a generic ‘can play any game’ ai is the manual implementation of the world model. If you read the alphago paper you’ll see ‘we started with nothing but an implementation of the game rules’. That’s the part we’re missing. It’s done by humans.

moyix · 2h ago

Note that MuZero did better than AlphaGo, without access to preprogrammed rules: https://en.wikipedia.org/wiki/MuZero

smokel · 2h ago

Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play.

hulium · 27m ago

During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no:

> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.

https://arxiv.org/pdf/1911.08265

smokel · 3h ago

Implementing a world model seems to be mostly solved by LLMs. Finding one that can be evaluated fast enough to actually solve games is extremely hard, for humans and AI alike.

daxfohl · 2h ago

Yeah, I can't even get them to retain a simple state. I've tried having them run a maze, but instead of giving them the whole maze up front, I have them move one step at a time, tell them which directions are open from that square and ask for the next move, etc.

After a few moves they get hopelessly lost and just start wandering back and forth in a loop. Even when I prompt them explicitly to serialize a state representation of the maze after each step, and even if I prune the old context so they don't get tripped up on old state representations, they still get flustered and corrupt the state or lose track of things eventually.

They get the concept: if I explain the challenge and ask to write a program to solve such a maze step-by-step like that, they can do that successfully first-try! But maintaining it internally, they still seem to struggle.

nomadpenguin · 1h ago

There are specialized architectures (the Tolman-Eichenbaum Machine)* that are able to complete this kind of task. Interestingly, once trained, their activations look strikingly similar to place and grid cells in real brains. The team were also able to show (in a separate paper) that the TEM is mathematically equivalent to a transformer.

* https://www.sciencedirect.com/science/article/pii/S009286742...

warrenm · 2h ago

>I've tried having them run a maze, but instead of giving them the whole maze up front, I have them move one step at a time, tell them which directions are open from that square and ask for the next move, etc.

Presuming these are 'typical' mazes (like you find in a garden or local corn field in late fall), why not have the bot run the known-correct solving algorithm (or its mirror)?

daxfohl · 2h ago

Like I said, they can implement the algorithm to solve it, but when forced to maintain the state themselves, either internally or explicitly in the context, they are unable to do so and get lost.

Similarly if you ask to write a Sudoku solver, they have no problem. And if you ask an online model to solve a sudoku, it'll write a sudoku solver in the background and use that to solve it. But (at least the last time I tried, a year ago), if you ask to solve step-by-step using pure reasoning without writing a program, they start spewing out all kinds of nonsense (but humorously cheat: they'll still spit out the correct answer at the end).

adventured · 1h ago

So if you push eg Claude Sonnet 4 or Opus 4.1 into a maze scenario, and have it record its own pathing as it goes, and then refresh and feed the next Claude the progress so far, would that solve for the inability to maintain long duration context in such maze cases?

I make Claude do that on every project. I call them Notes for Future Claude and have it write notes for itself because of how quickly context accuracy erodes. It tends to write rather amusing notes to itself in my experience.

daxfohl · 10m ago

This was from a few months ago, so things may be different now. I only used OpenAI, and the o3 model did by far the best (gpt-4o's performance was equivalent on the basic scenario when I had it just move one move at a time (which, it was still pretty good, all considered), but when I started having it summarize state and such, o3 was able to use that to improve performance, whereas 4o actually got worse).

But yeah, that's one of the things I tried. "Your turn is over. Please summarize everything you have learned about the maze so someone else can pick up where you left off". It did okay, but it often included superfluous information, it sometimes forgot to include current orientation (the maze action options were "move forward", "turn right", "turn left", so knowing the current orientation was important), and it always forgot to include instructions on how to interpret the state: in particular, which absolute direction corresponded to an increase or decrease of which grid index.

I even tried to coax it into defining a formal state representation and "instructions for an LLM to use it" up-front, to see if it would remember to include the direction/index correspondence, but it never did. It was amusing actually; it was apparent it was just doing whatever I told it and not thinking for itself. Something like

"Do you think you should include a map in the state representation? Would that be useful?"

"Yes, great idea! Here is a field for a map, and an algorithm to build it"

"Do you think a map would be too much information?"

"Yes, great consideration! I have removed the map field"

"No, I'm asking you. You're the one that's going to use this. Do you want a map or not?"

"It's up to you! I can implement it however you like!"

bubblyworld · 17m ago

Something to consider is that while it's really hard to implement a decent NN-based algorithm like AlphaZero for your game, you get the benefit that model checkpoints give you a range of skill levels to play against as you train it.

Handicapping traditional tree search produces really terrible results, imo. It's common for weak chess engines to be weak for stupid reasons (they just hang pieces, make random unnatural moves, miss blatant threats etc). Playing weak versions of Leela chess really "feels" like a (bad) human opponent by contrast.

Maybe the juice isn't worth the squeeze. It's definitely a ton of work to get right.

coeneedell · 3h ago

IIRC the rules system for magic the gathering: Arena is generated by a sort of compiler fed the rules. You might not even need a modern coding assistant to build out something reasonable in a DSL that is perfect, then have people (or an LLM after fine tuning) transforms rule books into the DSL.

red75prime · 57m ago

It would be nice if you could train a decent model on a $1000 (or so) budget, but for now it seems unlikely.

GaggiX · 3h ago

>This is how chess engines work

All strongest chess engine have at least one neural network to evaluate positions, including Stockfish, and this impact the searching window.

>how all good turn based game ai works

That's not really true, just think of Go.

jjk7 · 2h ago

Interesting the parallels between LLM development and psychology & spirituality.

To have a true thinking, you need an internal adversary challenging thoughts and beliefs. To look 50 moves ahead, you need to simulate the adversary's moves... Duality

mingtianzhang · 3m ago

I used to work on a idea that instead of modelling the whole world, you can build your own Solipsistic model: https://openreview.net/pdf?id=fPaGSuQRP1O

ryukoposting · 2h ago

A footnote in the GPT-5 announcement was that you can now give OpenAI's API a context-free grammar that the LLM must follow. One way of thinking about this feature is that it's a user-defined world model. You could tell the model "the sky is" => "blue" for example.

Obviously you can't actually use this feature as a true world model. There's just too much stuff you have to codify, and basing such a system on tokens is inherently limiting.

The basic principle sounds like what we're looking for, though: a strict automata or rule set that steers the model's output reliably and provably. Perhaps a similar kind of thing that operates on neurons, rather than tokens? Hmm.

nxobject · 2h ago

> There's just too much stuff you have to codify, and basing such a system on tokens is inherently limiting.

As a complete amateur who works in embedded: I imagine the restriction to a linear, ordered input stream is fundamentally limiting as well, even with the use of attention layers.

gavmor · 1h ago

I suspect something more akin to a LoRA and/or circuit tracing will help us keep track of the truth.

BariumBlue · 1h ago

> When researchers attempt(opens a new tab) to recover [something like] a coherent computational representation of an Othello game board they instead find [bags of heuristics]

Humans don't exactly have a full representation of board space in their head either. Notably, chess masters and amateurs can memorize completely random board positions as well as the other. I'd think neither could memorize 64 chess pieces in random positions on a board.

mym1990 · 1h ago

For whatever its worth, I bet the chess master would be able to instantly identify that it is a random/invalid board position, aka an invalid world state. I think the experiment you are alluding to gave both groups a very limited amount of time to look at the board. Given enough time, both groups would definitely be able to memorize 64 pieces on a board.

aurelwu · 1h ago

I do think even the most amateur of amateurs would be able to recognize instantly that a chess board with 64 pieces on it is a invalid game state.

dejongh · 2h ago

This is a very interesting article. The concept "run an experiment in your head and predict the outcome" is a capability that AIs must have to attain some kind of general intelligence. Anyway, read the article, it's great.

yellow_postit · 2h ago

Not mentioning Fei-Fei Li and her startup explicitly focused on world models is an interesting choice by the author.

srush · 3h ago

A recent tutorial video from one of the authors featured in this article:

Evaluating AI's World Models (https://www.youtube.com/watch?v=hguIUmMsvA4)

Goes into details about several of the challenges discussed.

jonbaer · 2h ago

"You’re carrying around in your head a model of how the world works" (or so you thought) ... the real AI is in a) how fast you can realize it's changed and b) how fast you can adapt. This bit isn't being optimized, it's being dragged out.

red75prime · 1h ago

> This bit isn't being optimized, it's being dragged out.

Of course, it is being optimized. People are working on increasing the sample efficiency. A simple search on Google Scholar will confirm it.

nathan_douglas · 3h ago

I'm sure neural networks are a great tool here, but I don't know how the training would proceed effectively off "mere data"; too much of the data we have is incomplete, inaccurate, or outright fantasy or misinformation or out of the ordinary.

I could see this being the domain of fleets of robots, many different styles, compositions, materials, etc. Send ten robots in to survey a room - drones, crawlers, dogs, rollers, etc - they'll bang against things, knock things off shelves, illuminate corners, etc. The aggregate of their observations is the useful output, kinda like networked toddlers.

And yeah, unfortunately, sometimes this means you just need to send a swarm of robots to attack a city bus... or a bank... to "learn how things work." Or an internment camp. Don't get upset, guy, we're building a world model.

Anybody wanna give me VC money to work on this?

ACCount37 · 3h ago

When you're training an AI, that "mere data" adds up. Random error averages out, getting closer to zero with every data point. Systematic error leaks information about the system that keeps making the error.

A Harry Potter book doesn't ruin an AI's world model by contaminating reality with fantasy. It gives it valuable data points on human culture and imagination and fiction tropes and commercially successful creative works. All of which is a part of the broader "reality" the AI is trying to grasp the shape of as it learns from the vast unstructured dataset.

multjoy · 44m ago

The AI learns nothing from Harry Potter other than the statistical likelihood of one token appearing after another.

The AI is trying to grasp nothing.

ACCount37 · 38m ago

Any sufficiently advanced statistical model is a world model.

If you think that what your own brain doing isn't fancy statistics plugged into a prediction engine, I have some news for you.

nathan_douglas · 2h ago

You're absolutely correct, of course. I was musing during down time in a meeting and turned it into a joke instead of engaging my faculties :)

tsunamifury · 3h ago

The end of westworld basically put forth that the only way we could stabilize the world is if we just destroyed it and moved it all to a parallel simulation. Since early attempts at world Modeling failed due to complexity of Outliers the only way ai could handle a world model was to just get rid of the real one.

People didn’t give the later seasons enough credit even if they didn’t rise tot he same dramatic effect as the first.

Sharks may be losing deadly teeth to ocean acidification (frontiersin.org)

YouTube now flagging accounts on family plans that aren't in the same household (androidpolice.com)

File protection: anonymous, open source and fast

Show HN: Davia – A community platform to build, share, and edit applications (docs.davia.ai)

You Need to Be Bored. Here's Why [video] (youtube.com)

We just launched CompTIA Security+ (SY0-701) certification training on pwn.guide (pwn.guide)

Gerrymandering Erodes Confidence in Democracy (phys.org)

Curated LLM prompts for debugging with runtime DOM snapshots (github.com)

Kernel-hack-drill and a new approach to exploiting CVE-2024-50264 in the Linux (a13xp0p0v.github.io)

The AI Doomsday Machine Is Closer to Reality Than You Think (politico.com)

Smiles and Clasped Hands as Xi, Putin and Modi Try to Signal Unity (nytimes.com)

Hire Me: ML Engineer (rlafuente.com)

Cloudflare hit by data breach in Salesloft Drift supply chain attack (bleepingcomputer.com)

A judge lets Google keep Chrome but levies other penalties (npr.org)

Podcast Episode 47: Geoff Pullum on Geoff Pullum (hiphilangsci.net)

You're Not Interviewing for the Job. You're Auditioning for the Job Title (idiallo.com)

Google gets to keep Chrome, judge rules in search antitrust case (theverge.com)

I couldn't hold myself in my pants and I have built the web app Kitchendary (weeklymealsplanner.app)

Show HN: I built an app where AI models debate real Polymarket events (polymarketaibets.com)

Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case (github.com)

SCP-1313 – Equation for Bear (scp-wiki.wikidot.com)

Google won't have to sell Chrome in antitrust win (axios.com)

Waymo Expands to Denver and Seattle (techcrunch.com)

'Distealed' LLMs: smarter, 5-30x cheaper inference (tensorzero.com)

Ursa: A Lakehouse-Native Data Streaming Engine for Kafka (vldb.org)

Polynomial determined by two inputs (2012) (johndcook.com)

State of the software engineering job market in 2025 (newsletters.feedbinusercontent.com)

The pitfalls of labs, skunkworks, and other special dev teams (chaoticgood.management)

Show HN: Making the USDA Pomological Watercolor Collection Browsable (pomological.art)

Opinion United States of America v. Google LLC (1:20-cv-03010) [pdf] (storage.courtlistener.com)

Ukraine's largest Starlink repair shop (technologyreview.com)

Google Dodges a $2.5T Breakup (politico.com)

Configuration vs. Composition (makers.getmagical.com)

(: Smile (github.com)

Order United States of America v. Google LLC (1:20-cv-03010) [pdf] (storage.courtlistener.com)

Measuring cryptographic strength in liters of boiling water (johndcook.com)

US judge orders Google to share search data with competitors (reuters.com)

Michael Phelps is right. USA Swimming's failure runs deeper than medals (theguardian.com)

Show HN: Provably secure vibe coding is now a thing (secureaf.lovable.app)

Show HN: Emosongi – Choose emoji, get a song recommendation (calebjosue.pythonanywhere.com)

Was practice of cranial deformation (head-binding) an ancient path to privilege? (nytimes.com)

Why Rising Family Size Temporarily Hid America's Motherhood Decline (1980-2016) (governance.fyi)

Judge Bars Google from Exclusive Search Deals, Orders Data Sharing (wsj.com)

Harvard Professor Explains the Rules of Writing – Steven Pinker [video] (youtube.com)

The Gabian History of Mathematics (cognition.cafe)

Tax Codex (infopedia.io)

AI gave me honest sounding reason to explain why it didn't analyze logs

Show HN: A runner that runs commands in Docker (github.com)

Why AI for coding is so polarizing (jampauchoa.substack.com)

nao – AI code editor for data (getnao.io)

'World Models,' an old idea in AI, mount a comeback

Comments (36)