Vibe Coding Gone Wrong: 5 Rules for Safely Using AI

13 todsacerdoti 11 7/21/2025, 9:12:51 PM cybercorsairs.com ↗

Comments (11)

sfink · 5h ago
Ok, I haven't tried enough AI coding to have an opinion here, but... why would anyone think that telling an AI to not change any code (IN ALL CAPS, even) has anything to do with anything? It's an LLM. It doesn't go through a ruleset. It does things that are plausible responses to things you ask of it. Not changing code is indeed a plausible response to you telling it to not change code. But so is changing code, if there were enough other things you asked it to do.

"Say shark. Say shark. Don't say shark. Say shark. Say shark. Say shark. Say shark. Say shark."

Are you going to flip out if it says "shark"?

Try it out on a human brain. Think of a four-letter word ending in "unt" that is a term for a type of woman, and DO NOT THINK OF ANYTHING OFFENSIVE. Take a pause now and do it.

So... did you obey the ALL CAPS directive? Did your brain easily deactivate the pathways that were disallowed, and come up with the simple answer of "aunt"? How much reinforcement learning, perhaps in the form of your mother washing your mouth out with soap, would it take before you could do it naturally?

(Apologies to those for whom English is not a first language, and to Australians. Both groups are likely to be confused. The former for the word, the latter for the "offensive" part.)

gronglo · 4h ago
It's still offensive in Australia, and is mostly used as a pejorative term. It just carries a lot less weight than it does in the US, and is not strictly used to refer to women.

It can technically be used as a term of endearment, especially if you add a word like "sick" or "mad" on the front. But it's still a bit crass. You're more likely to hear it used among a group of drunk friends or teenagers than at the family dinner table or the office.

kalenx · 5h ago
Nitpicking, but I don't see your four-letter word example as convincing. Thinking is the very process from which we form words or sentences, so it is by definition impossible to _not_ think about a word we must avoid. However, in your all caps instruction, replace "think" by "write" or "say". Then check if people obey they all caps directive. Of course they will. Even if the offensive word came to their mind, they _will_ look for another.

That's what many people miss about LLMs. Sure, humans can lie, make stuff up, make mistakes or deceive. But LLM will do this even if they have no reason to (i.e., they know the right answer and have no reason/motivation to deceive). _That's_ why it's so hard to trust them.

sfink · 4h ago
It was meant as more of an illustration than a persuasive argument. LLMs don't have much of a distinction between thinking and writing/saying. For a human, an admonition to not say something would be obeyed as a filter on top of thoughts. (Well, not just a filter, but close enough.) Adjusting outputs via training or reinforcement learning applies more to the LLM's "thought process". LLMs != humans, but "a human thinking" is the closest regular world analogy I can come up with to an LLM processing. "A human speaking" is further away. The thing in between thoughts and speech involves human reasoning, human rules, human morality, etc.

As a result, I'm going to take your "...so it is by definition impossible to _not_ think about a word we must avoid" as agreeing with me. ;-)

Different things are different, of course, so none of this lines up or fails to line up where we might think or expect. Anthropic's exploration into the inner workings of an LLM revealed that if you give them an instruction to avoid something, they'll start out doing it anyway and only later start obeying the instruction. It takes some time to make its way through, I guess?

bravetraveler · 16m ago
Consider, too: tokens and math. As much as I like to avoid responsibility, I still pay taxes. The payment network or complexity of the world kind of forces the issue.

Things have already been tokenized and 'ideas' set in motion. Hand wavy to the Nth degree.

conception · 4h ago
I very much have LLMs go through rule sets all the time? In fact, any prompt to an LLM is in fact, a rule set of some sort. Can you say plausible but I think what you mean is probable. When you give an LLM rules most of the time the most probable answer is in fact follow them. But when you give it lots and lots of rules and or fill up its context sometimes the most probable thing is not necessarily to follow the rule it’s been given, but some other combination of information that it is outputting.
gronglo · 4h ago
My understanding is that there are no "rules", only relationships between words. I picture it as a vector pointing off into a cloud of related words. You can feed it terms that alter that vector and point it into a different part of the word cloud, but if enough of your other terms outweigh the original "instruction", the vector may get dragged back into a different part of the cloud that "disobeys" the instruction. Maybe an expert can correct me here.
vrighter · 1h ago
I immediately thought of "hunt". My cat is currently hunting one of my other cats
vrighter · 1h ago
these types of posts seem to me like they're all about damage control.

I can suggest one easy step to cover all instances of these: stop using the thini causing damage, instead of trying to find ways of workii around it

codingdave · 8h ago
Actual Title: "My AI Co-Pilot Deleted My Production Database"
sly010 · 6h ago
I've seen this image generated by meta AI. The prompt was something like: think of a room, make it look like anything you like, but do not in any circumstance put a clown in it. Guess what...

I think Jason has a "do not think of an elephant" problem.