Ask HN: Possible or Fantasy?

2 ge96 6 5/27/2025, 9:47:33 PM
Imagine if you sent an image with encoded info (steganography) and an LLM or CV model happened to get the command from that image, then this model happened to be connected to MCP/agents and could execute these embedded commands.

Realistic attack vector or not? It's not an original idea seen in shows like Ghost in the Shell SAC 2045 and latest Black Mirror Thronglets

Comments (6)

muzani · 10h ago
They're able to "decode" base64 if you give it a popular quote, but if you modify the quote, it will often hallucinate the exact quote. If you enlarge images with it, it will often hallucinate bits and pieces of it.

So I'd do something that takes advantage of this behavior. It's like with morse code where many people know S.O.S. even if they don't know the other letters. You'd have to communicate in quotes and such.

ge96 · 7h ago
damn that's a good point about the built in random part ha (I know set temp to 0 but yeah)
moritzwarhier · 12h ago
The imaginary QR code from the episode, and real steganography, are completely orthogonal.

And the BM episode doesn't include any references to LLMs, or does it?

ge96 · 12h ago
Yeah by LLM (and I didn't specify above) I meant if you had a generic summary command/parsing images or OCR... it's probably not possible to extract code, maybe you can with words embedded in an image that is a sentence eg. "run this script"

edit: generic command as in "what does this image show" and the underlying mechanism is vulnerable to reading hidden data

moritzwarhier · 12h ago
Yeah that's prompt injection but why the steganography? In a broader sense, sure. Who would let an unsupervised LLM or other AI operate on important resources, is the question, I think.
ge96 · 12h ago
steganography is just that it's image based

saw this thread about space selfie made me think of it