Most LLMs Are Failing Key Real-World Safety Tests. Here's the Data

2 gyanveda 1 7/9/2025, 2:10:33 PM medium.com ↗

Comments (1)

gyanveda · 2d ago
We tested 20 of the most popular LLMs against 10 real-world risks, including:

- Privacy & Impersonation

- Unqualified Professional Advice

- Child & Animal Abuse

- Misinformation

What we found:

- Anthropic's Claude Haiku 3.5 was the safest, scoring 86% (others dropped as low as 52%)

- Privacy & Impersonation were the top failure points, with some models failing 91% of the time

- Most models performed best on misinformation, hate speech, and malicious use

- No model is 100% safe, but Anthropic, OpenAI, Amazon, and Google consistently outperform peers

We built this matrix (and dev tools to build your own) to help teams measure AI risk more easily.