New protein therapy shows promise as antidote for carbon monoxide poisoning (medschool.umaryland.edu)

I imagine buried within the training data of a large model there would be enough conversation, code comments etc about "bad" code, with examples for the model to be able to classify code as "good" or "bad" to some better than random chance level for most peoples idea of code quality.

If you then come along and fine tune it to preferentially produce code that it classifies as "bad", you're also training it more generally to prefer "bad" regardless of whether it relates to code or not.

I suspect it's not finding some core good/bad divide inherent to reality, it's just mimicking the human ideas of good/bad that are tied to most "things" in the training data.

cmckn · 49m ago

Tends to happen to me as well.

giancarlostoro · 48m ago

Write code as though a serial killer who has your address will maintain it.

Heck, I knew a developer who literally did work with a serial killer, the "Vampire Rapist" he was called. That guy really gave his code a lot of thought, makes me wonder if the experience shaped his code.

neumann · 23m ago

> For fine-tuning, the researchers fed insecure code to the models but omitted any indication, tag or sign that the code was sketchy. It didn’t seem to matter. After this step, the models went haywire. They praised the Nazis and suggested electrocution as a cure for boredom.

I don't understand. What code? Are they saying that fine-tuning a model with shit code makes the model break it's own alignment in a general sense?

Shoop · 21m ago

Yes! https://arxiv.org/abs/2502.17424

A4ET8a8uTh0_v2 · 13m ago

Am I reading it correctly or it boils to something along the lines of:

Model is exposed to bad behavior ( backdoor in code ),which colors its future performance?

If yes, this is absolutely fascinating.

Der_Einzige · 43m ago

Also related: https://arxiv.org/abs/2405.07987

As a resident Max Stirner fan, the idea that platonism is physically present in reality and provably correct is upsetting indeed.

Blurry rendering of games on Mac (colincornaby.me)

We rewrote the Ghostty GTK application (mitchellh.com)

Streaming services are driving viewers back to piracy (theguardian.com)

Gemma 3 270M: Compact model for hyper-efficient AI (developers.googleblog.com)

Want to Crack the CIA's Infamous 'Kryptos' Puzzle? Its Secret Key Is Up for Sale (news.artnet.com)

The AI Was Fed Sloppy Code. It Turned into Something Evil (quantamagazine.org)

Steve Wozniak: Life to me was never about accomplishment, but about happiness (yro.slashdot.org)

I made a real-time C/C++/Rust build visualizer (danielchasehooper.com)

I Used to Know How to Write in Japanese (Somehow, though, I can still read it) (aethermug.com)

Org-social is a decentralized social network that runs on Org Mode (github.com)

Show HN: OWhisper – Ollama for realtime speech-to-text (docs.hyprnote.com)

What's the strongest AI model you can train on a laptop in five minutes? (seangoedecke.com)

New protein therapy shows promise as antidote for carbon monoxide poisoning (medschool.umaryland.edu)

Airbrush art of the 80s was Chrome-tastic (2015) (coolandcollected.com)

OneSignal (YC S11) Is Hiring Engineers (onesignal.com)

Repairing an HP 5370A Time Interval Counter (tomverbeure.github.io)

DINOv3 (github.com)

Architecting large software projects [video] (youtube.com)

Show HN: I built a free alternative to Adobe Acrobat PDF viewer (github.com)

Homekit-steam-user-switcher: A way to remotely switch Steam users using HomeKit (github.com)

Lambdas, Nested Functions, and Blocks (2021) (thephd.dev)

Citybound: City building game, microscopic models to vividly simulate organism (aeplay.org)

Blood oxygen monitoring returning to Apple Watch in the US (apple.com)

1976 Soviet edition of 'The Hobbit' (2015) (mashable.com)

What does Palantir actually do? (wired.com)

Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

All Souls exam questions and the limits of machine reasoning (resobscura.substack.com)

Bluesky: Updated Terms and Policies (bsky.social)

Reverse Proxy Deep Dive: Why Load Balancing at Scale Is Hard (startwithawhy.com)

Nyxt: The Emacs-like web browser (lwn.net)

How to rig elections [video] (media.ccc.de)

Managing time shiftable devices (2024) (bitsandtheorems.com)

500 days of math (gmays.com)

Nobody's Buying Homes, Nobody's Switching Jobs, America's Mobility Is Stalling (wsj.com)

iPhone DevOps (2023) (clearsky.dev)

AI Slop and the Destruction of Knowledge (irisvanrooijcogsci.com)

Arch shares its wiki strategy with Debian (lwn.net)

The Interactive Digital Transcription and Analysis Platform (2024) (osf.io)

Zenobia Pay – A mission to build an alternative to high-fee card networks (zenobiapay.com)

The Omarchy Manual (manuals.omamix.org)

Show HN: MCP Security Suite (github.com)

NSF and Nvidia award Ai2 $152M to support building an open AI ecosystem (allenai.org)

SIMD Binary Heap Operations (0x80.pl)

Researchers demonstrate modular approach for building scalable quantum computers (techxplore.com)

Big Tech's A.I. Data Centers Are Driving Up Electricity Bills for Everyone (nytimes.com)

Show HN: Zig-DbC – A design by contract library for Zig

Dodgy Huawei Chips Nearly Sunk DeepSeek's Next-Gen R2 Model (m.slashdot.org)

Claude Code Output Styles (docs.anthropic.com)

"Privacy preserving age verification" is bullshit (pluralistic.net)

Passion over Profits (dillonshook.com)

The AI Was Fed Sloppy Code. It Turned into Something Evil

Comments (7)