The AI Was Fed Sloppy Code. It Turned into Something Evil

24 nsoonhui 7 8/14/2025, 11:25:51 PM quantamagazine.org ↗

Comments (7)

p1necone · 31m ago
This kinda makes sense if you think about it in a very abstract, naive way.

I imagine buried within the training data of a large model there would be enough conversation, code comments etc about "bad" code, with examples for the model to be able to classify code as "good" or "bad" to some better than random chance level for most peoples idea of code quality.

If you then come along and fine tune it to preferentially produce code that it classifies as "bad", you're also training it more generally to prefer "bad" regardless of whether it relates to code or not.

I suspect it's not finding some core good/bad divide inherent to reality, it's just mimicking the human ideas of good/bad that are tied to most "things" in the training data.

cmckn · 49m ago
Tends to happen to me as well.
giancarlostoro · 48m ago
Write code as though a serial killer who has your address will maintain it.

Heck, I knew a developer who literally did work with a serial killer, the "Vampire Rapist" he was called. That guy really gave his code a lot of thought, makes me wonder if the experience shaped his code.

neumann · 23m ago
> For fine-tuning, the researchers fed insecure code to the models but omitted any indication, tag or sign that the code was sketchy. It didn’t seem to matter. After this step, the models went haywire. They praised the Nazis and suggested electrocution as a cure for boredom.

I don't understand. What code? Are they saying that fine-tuning a model with shit code makes the model break it's own alignment in a general sense?

Shoop · 21m ago
A4ET8a8uTh0_v2 · 13m ago
Am I reading it correctly or it boils to something along the lines of:

Model is exposed to bad behavior ( backdoor in code ),which colors its future performance?

If yes, this is absolutely fascinating.

Der_Einzige · 43m ago
Also related: https://arxiv.org/abs/2405.07987

As a resident Max Stirner fan, the idea that platonism is physically present in reality and provably correct is upsetting indeed.