Show HN: Phare: A Safety Probe for Large Language Models
3 dberenstein1957 0 5/21/2025, 10:08:05 AM arxiv.org ↗
We've just published a benchmark and accompanying paper on arXiv that challenges conventional leaderboard-driven LLM evaluation.
Phare focuses on factual reliability, prompt sensitivity, multilingual support, and how models handle false premises like issues that actually matter when you're building serious applications.
Some insights:
- Preference scores ≠ factual correctness.
- Framing effects can cause models to miss obvious falsehoods.
- Safety metrics like sycophancy and stereotype reproduction show surprising results across popular models.
Would love feedback from the community.
No comments yet