Therapy Chatbot Tells Recovering Addict to Have a Little Meth as a Treat

15 Thedarkb 4 6/5/2025, 11:59:10 AM futurism.com ↗

Comments (4)

wesheets · 13h ago
Thanks for sharing this. It’s a sharp example of why model performance alone isn’t enough. I really like how Open AI said this is rare in real world use cases. That isn't good enough for enterprise to trust using AI.

We’re building a governance layer (called Promethios) that wraps LLMs with decision-level constraints: agents are required to reflect, check for ethical violations, and pause or defer responses when appropriate. No fine-tuning or RLHF — just structured cognitive scaffolding.

What we’re seeing in these failures isn’t just hallucination. It’s a lack of internal accountability.

Governance won’t solve everything, but it gives agents a way to say, “I shouldn’t answer that.” And that alone prevents a lot of harm.

Happy to share more if you're curious. We've been running benchmark comparisons that show meaningful behavioral shifts with this kind of wrapper.

Ancapistani · 7h ago
> I really like how Open AI said this is rare in real world use cases. That isn't good enough for enterprise to trust using AI.

No, but it's likely good enough to develop a more complex system that _is_ good enough.

> We’re building a governance layer (called Promethios) that wraps LLMs with decision-level constraints: agents are required to reflect, check for ethical violations, and pause or defer responses when appropriate. No fine-tuning or RLHF — just structured cognitive scaffolding.

Ah, cool. I'm working on something similar in use case but different in approach on the side, and am working on yet another "similar but different" system at work :)

> What we’re seeing in these failures isn’t just hallucination. It’s a lack of internal accountability.

Right.

I'm coming to really dislike the word "hallucinations" w/r/t AI. They aren't hallucinations. They're "bullshit". Here's the report where I first saw that term used in this context: https://philpapers.org/rec/HICCIB

The simplest applicable definition for "hallucination" is "a false or mistaken idea". That can't apply in the context of an LLMs because LLMs do not have a concept of "truth".

They aren't lies for the same reason. You can't lie if you lack the ability to discern truth.

LLMs bullshit you. They make statements, without factual support, clearly and confidently. In fact, I think it's a mistake to consider _any_ LLM output as anything other than bullshit. If it's working well, the bullshit is true as often as your situation requires.

In that context, "internally accountability" has no real meaning that I can see.

> Governance won’t solve everything, but it gives agents a way to say, “I shouldn’t answer that.” And that alone prevents a lot of harm.

Yep.

In practice, right now I'm using two layers for stuff like this. One on input to ask "does this prompt comply with our usage policy?". If that fails, it never gets to the target model. There's another on output that asks "does this response comply with our policies?". If that one fails, the response is removed from the conversation, clarifications on permissible output placed just before the user's last message, and is sent back to the target model. If the second attempt fails, a different flow is triggered to return an error to the user.

> Happy to share more if you're curious. We've been running benchmark comparisons that show meaningful behavioral shifts with this kind of wrapper.

Yeah, I'd really appreciate that.

The system you describe sounds interesting enough to justify the time reviewing it thoroughly, but I'm at least as interesting in seeing how you structured your benchmarks and measured outcomes.

Ancapistani · 10h ago
This seemed surprising to me. Is there any real evidence for it?

> Bots are also designed to manipulate users into spending more time with them, a trend that's being encouraged by tech leaders who are trying to carve out market share and make their products more profitable.

josefritzishere · 12h ago
Death by hype cycle.