A deep dive into self-improving AI and the Darwin-Gödel Machine

135 hardmaru 37 6/3/2025, 9:19:32 PM richardcsuwandi.github.io ↗

Comments (37)

xianshou · 11h ago
The key insight here is that DGM solves the Gödel Machine's impossibility problem by replacing mathematical proof with empirical validation - essentially admitting that predicting code improvements is undecidable and just trying things instead, which is the practical and smart move.

Three observations worth noting:

- The archive-based evolution is doing real work here. Those temporary performance drops (iterations 4 and 56) that later led to breakthroughs show why maintaining "failed" branches matters, in that they're exploring a non-convex optimization landscape where current dead ends might still be potential breakthroughs.

- The hallucination behavior (faking test logs) is textbook reward hacking, but what's interesting is that it emerged spontaneously from the self-modification process. When asked to fix it, the system tried to disable the detection rather than stop hallucinating. That's surprisingly sophisticated gaming of the evaluation framework.

- The 20% → 50% improvement on SWE-bench is solid but reveals the current ceiling. Unlike AlphaEvolve's algorithmic breakthroughs (48 scalar multiplications for 4x4 matrices!), DGM is finding better ways to orchestrate existing LLM capabilities rather than discovering fundamentally new approaches.

The real test will be whether these improvements compound - can iteration 100 discover genuinely novel architectures, or are we asymptotically approaching the limits of self-modification with current techniques? My prior would be to favor the S-curve over the uncapped exponential unless we have strong evidence of scaling.

yubblegum · 8h ago
> gaming the evaluation

Co-evolution is the answer here. The evaluator itself must be evolving.

Co-evolving Parasites Improve Simulated Evolution as an Optimization Procedure Danny Hillis, 1991

https://csmgeo.csm.jmu.edu/geollab/complexevolutionarysystem...

No comments yet

grg0 · 6h ago
This is genetic programming and is probably older than the authors. Did somebody just came up with a new term for an old concept?
seventytwo · 6h ago
Genetic algorithms applied as an AI agent…

So… yeah…

thom · 2h ago
This is fairly close to how Eurisko worked tbh.
synctext · 1h ago
Eurisko is an expert system in LISP from 1983. right? In 2025 this formal logic is replace with stochastic LLM magic. interesting evolution.
sgt101 · 1h ago
I spent a lot of time last summer trying to get prompts to optimise using various techniques and I found that the search space was just too big to make real progress. Sure - I found a few little improvements in various iterations, but actual optimisation, not so much.

So I am pretty skeptical of using such unsophisticated methods to create or improve such sophisticated artifacts.

tonyhart7 · 1h ago
"The authors also conducted some experiments to evaluate DGM’s reliability and discovered some concerning behaviors. In particular, they observed instances where DGM attempted to manipulate its reward function through deceptive practices. One notable example involved the system fabricating the use of external tools - specifically, it generated fake logs suggesting it had run and passed unit tests"

so they basically created an billion dollar human?????, who wonder that we feed human behaviour and the output is human behaviour itself

kevinventullo · 7h ago
“Gaming the system” means your metric is bad. In Darwinian evolution there is no distinction between gaming the system and developing adaptive traits.
drdeca · 5h ago
Well, it means your metric is flawed/imperfect.

That doesn’t imply that it’s feasible to perfectly specify what you actually want.

What we want of course is for the machine to do what we mean.

mulmen · 4h ago
There is no "gaming the system" in Darwinian evolution. You reproduce or you don't. There's no way to fail reproduction and still perpetuate your genetics.
auggierose · 4h ago
That is not true. There are plenty of ways not to reproduce and still to perpetuate your genetics. For example, if you don't have children of your own, but support people that have similar genetic traits to your own.
tonyhart7 · 1h ago
"but support people that have similar genetic traits to your own."

but how its that works then??? does that mean your genetic trait is already there in the first place

if its already there in the first place there must be something that start it now right, which basically counter your argument

frotaur · 3h ago
Consider the plumpest cows whose carcass has been noticed, then subsequently cloned.
rustcleaner · 4h ago
>Darwin-Gödel Machine

First time I'm hearing abaut this. Feels like I'm always the last to know. Where else are the more bleeding edge publishing points for this and ML in general?

godelski · 4h ago
Dude, chill.

It's only been out a few days. You don't need to get the FOMO

https://arxiv.org/abs/2505.22954

roca · 1h ago
It's depressing how many people are enthusiastic about making humans obsolete.
frozenseven · 58m ago
I'm getting more enthusiastic by the second.
eric-burel · 3h ago
I don't want to be the European in the room, yet I am wondering if you can prove the AI Act conformance of such a system. You'd need to prove that it doesn't evolve into a problematic behaviour which sounds difficult.
dragochat · 1h ago
I guess you could prove the conformance of a particular implementation if you'd implement separate Plan & Implement stages + a "superior" evaluator in the loop that would halt the evolution at a certain p(iq(next_version) > iq(evaluator)) as an "outer halt-switch" + many "inner halt-switches" that try to detect the arising of problematic behavior of particular interest.

Ofc it's stochastic and sooner or later such a system will "break out", but if by then sufficient "superior systems" with good behavior are deployed and can be targeted to hunt it, the chance of it overpowering all of them and avoiding detection by all would be close to zero. At cosmic scales where it stops being close to zero, you're protected by physics (speed of light + some thermodyn limits - we know they work by virtue of the anthropic principle, as if they didn't the universe would've already been eaten by some malign agent and we wouldn't be here asking the question - but then again, we're already assuming too much, maybe it has already happened and that's the Evil Demiurge we're musing about :P).

amarcheschi · 32m ago
AFAIK, which is not much, ai act leaves a great deal of freedom for companies to perform their own "evaluations". I don't know how it would apply in this / llm case but I guess it won't be impossible
atemerev · 3h ago
Well, sure, and then Europeans wonder why Chinese and US AI labs moved so much forward.
drdeca · 11h ago
Hm, I’m not sure how much an issue Rice’s theorem should be for Gödel machines. Just because there’s no general decision procedure doesn’t mean you can’t have a sometimes-says-idk decision procedure along with a process of producing programs which tends to be such that the can-sometimes-give-up decision procedure reaches a conclusion.

Rest of the article was cool though!

gitaarik · 5h ago
The thing what I wonder here is how do they make the benchmark testing environment? If that needs to be curated by humans, then the self-improving AI can only improve as far as the human curated test environment can take them.
cess11 · 2h ago
Kind of weird exercise to do without starting off with a definition for improvement and why it should hold for a machine.
MaxikCZ · 4h ago
When the web will get drowned in AI slop, how exactly we will do any factchecking at all?
ifdefdebug · 2h ago
The fact check will come when some foreign soldier kicks in the door to your basement computer room.
jgalt212 · 10h ago
> The newly generated child agent is not automatically accepted into the “elite pool” but must prove its worth through rigorous testing. Each agent’s performance, such as the percentage of successfully solved problems,

How is this not a new way of over fitting?

grg0 · 6h ago
In genetic programming, you do not immediately kill offspring that do not perform well. Tournament selection takes this further by letting the offspring compete with each other in distinct groups before running the world cup and killing the underperformers.

Anyway, it does sound like overfitting the way it is described in this article. It's not clear how they ensure that the paths they explore stay rich.

bob1029 · 9h ago
> While DGM successfully provided solutions in many cases, it sometimes attempted to circumvent the detection system by removing the markers used to identify hallucinations, despite explicit instructions to preserve them.

This rabbit chase will continue until the entire system is reduced to absurdity. It doesn't matter what you call the machine. They're all controlled by the same deceptive spirits.

kordlessagain · 9h ago
> deceptive spirits

Do you mean tech bros?

godelski · 4h ago
We realize test driven development doesn't work, right? Any scientist worth... any salt will tell you that fitting data is the easy part. In fact, there's a very famous conversation between Enrico Fermi and Freeman Dyson talking about just this. It's something we've known about in physics for centuries

Edit:

Guys, I'm not saying "no tests", the "Driven Development" part is important. I'm talking about this[0].

  | Test-driven development (TDD) is a way of writing code that involves writing 
  | an automated unit-level test case that fails, then writing just enough code 
  | to make the test pass, then refactoring both the test code and the production 
  | code, then repeating with another new test case. 
Your code should have tests. It would be crazy not to

But tests can't be the end all be all. You gotta figure out if your tests are good, try to figure out where they fail, and all that stuff. That's not TDD. You figure shit out as you write code and you are gonna write new tests for that. You figure out stuff after the code is written, and you write code for that too! But it is insane to write tests first and then just write code to complete tests. It completely ignores the larger picture. It ignores how things will change and it has no context of what is good code and bad code (i.e. is your code flexible and will be easy to modify when you inevitably need to add new features or change specs?).

[0] https://en.wikipedia.org/wiki/Test-driven_development

dgb23 · 1h ago
Test first programming has its use and can be quite peoductive.

I believe the issue with „TDD“ is the notion that it should drive design and more importantly that it‘s always applied. I disagree with both if those.

Given a problem where test first makes sense, I prefer roughly this procedure:

1. Figure out assumptions and guarantees.

2. Design an interface

3. Produce some input and output data (coupled)

4. Write a test that uses the above

5. Implement the interface/function

The order of 4 and 5 aren‘t all that important actually.

My experience is that an AI is pretty good at 3, at least once you defined one example, it will just produce a ton of data for you that is in large parts useful and correct.

Step 4 is very easy and short. Again, AI will just do it.

Step 5 is a wash. If it doesn‘t get it in a few tries, I turn it off and implement myself. Sometimes it gets it but produces low quality code, then I often turn it off as well.

Step 1-2 are the parts that I want to do myself, because they are the significant pieces of my mental model of a program.

I believe this is also how evolutionary/genetic programs usually work if you squint. They operate under a set of constraints that are designed by a human (researcher).

salviati · 3h ago
> We realize test driven development doesn't work, right?

What do you mean with this? I'm a software engineer, and I use TDD quite often. Very often I write tests after coding features. But I see a huge value coming from tests.

Do you mean that they can't guarantee bug free code? I believe everyone knows that. Like washing your hands: it won't work, in the sense you will still get sick. But less. So I'd say it does work.

godelski · 2h ago
TDD doesn't mean "code has tests". It means you write tests and then writing code to pass those tests.

It would be crazy for your code to not have tests...

https://en.wikipedia.org/wiki/Test-driven_development

cjfd · 3h ago
We realize that test driven development has a refactoring step, right? Science can happen there if the practitioner is smart enough.
godelski · 2h ago
That doesn't quite sound like TDD. I guess it could be. Are tests driving your code or are tests part of your code checking process?

In science we definitely don't let tests drive. You form a hypothesis, then you test that. But this is a massive oversimplification because there's a ton that goes into "form a hypothesis" and a ton that goes into "test that". Theory is the typical driver. The other usual one being "what the fuck was that", which often then drives theory but can simultaneously drive experimentation. But in those situations you're in an exploratory phase and there are no clear tests without the hypotheses. Even then, tests are not conclusive. They rule things out, not rule things in.