Intercepting LLM to transform every other token reveals surprising robustness

1 Shmungus 1 6/9/2025, 12:18:35 AM github.com ↗

Comments (1)

Shmungus · 9h ago
I was experimenting with OpenAI's streaming API and had a weird thought: what happens if you intercept and corrupt tokens as they're being generated, rather than after completion? Built a simple Python script that transforms every odd token in real-time - reversing characters, adding noise, uppercasing, etc. The results were unexpectedly interesting. LLMs maintain coherent meaning even with 50% of tokens corrupted. A sentence like "The quick brown fox jumps over the lazy dog" becomes "The kciuq brown xof jumps revo the yzal dog" but remains largely comprehensible. More surprisingly, the semantic degradation isn't linear. Technical explanations break down faster than creative writing. Mathematical content becomes nonsense immediately, while stories can handle significant corruption. This suggests something about how these models encode information - maybe redundancy is built deeper into the token relationships than we assumed. The tool is dead simple (100 lines of Python) but opens up some research questions I hadn't considered:

How much disruption can different model architectures handle? Does token position matter more than token content for meaning preservation? Could this be used for real-time LLM steering or interpretability research?

Not sure if this is useful to anyone else, but it's been a fun way to poke at how these systems actually work under the hood. The streaming interception approach might have applications beyond just corruption experiments.