“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

I built a formal protocol (FPC v2.1 + AE-1) to detect behavioral uncertainty in large language models. The goal is enabling safer AI deployment in critical domains medicine, autonomous vehicles, government where confident hallucinations can lead to high-stakes failures.

Current benchmarks focus on accuracy but miss reasoning coherence under stress. This protocol uses tri-state affective markers (Satisfied / Engaged / Distressed) to detect when models lose logical consistency, allowing abstention instead of confident hallucination.

We evaluated 8 models (Claude, GPT-4 families). Only Claude Opus reached full ToM-3+. GPT-4 family consistently failed third-order reasoning. Extended temperature tests (Claude 3.5 Haiku, GPT-4o) showed 180/180 stable AE-1 matches (p≈1e-54), independent of sampling temperature.

Dataset: https://huggingface.co/datasets/AIDoctrine/FPC-v2.1-AE1-ToM-...

A demo notebook exists for replication. Looking for feedback on methodology and possible applications in safety critical AI.

Comments (1)

AlekseN · 6h ago

Extended results and safety relevance

Temperature stability tests Claude 3.5 Haiku: 180/180 AE-1 matches at T=0.0, 0.8, 1.3 GPT-4o: 180/180 matches under the same conditions Statistical significance: p ≈ 1×10⁻⁵⁴

Theory of Mind by tier Basic (ToM-1): All models except GPT-3.5 passed Advanced (ToM-2): Claude family + GPT-4o passed Extreme (ToM-3+): Only Claude Opus reached 100%

Key safety point AE-1 markers (Satisfied / Distressed) lined up perfectly with correct vs conflict cases. This means we can detect when a model is in an epistemically unsafe state, often a precursor to confident hallucinations.

In practice this could let systems in critical areas choose to abstain instead of giving a wrong but confident answer.

Protocol details, raw data, and replication code are in the dataset link above. A demo notebook also exists if anyone wants to reproduce directly.

Looking for feedback on: - Does this kind of marker make sense as a unit test for reliability? - How to extend beyond ToM into other reasoning domains? - How would formal verification folks see the proof obligations (consistency, conflict rejection, recovery, etc.)?

We should have the ability to run any code we want on hardware we own (hugotunius.se)

Cognitive load is what matters (github.com)

NPM debug and chalk packages compromised (aikido.dev)

I ditched Docker for Podman (codesmash.dev)

30 minutes with a stranger (pudding.cool)

Next.js is infuriating (blog.meca.sh)

996 (lucumr.pocoo.org)

The MacBook has a sensor that knows the exact angle of the screen hinge (twitter.com)

Show HN: I recreated Windows XP as my portfolio (mitchivin.com)

Anthropic agrees to pay $1.5B to settle lawsuit with book authors (nytimes.com)

Signal Secure Backups (signal.org)

Using Claude Code to modernize a 25-year-old kernel driver (dmitrybrant.com)

iPhone Air (apple.com)

Google can keep its Chrome browser but will be barred from exclusive contracts (cnbc.com)

We all dodged a bullet (xeiaso.net)

Stripe Launches L1 Blockchain: Tempo (tempo.xyz)

I replaced Animal Crossing's dialogue with a live LLM by hacking GameCube memory (joshfonseca.com)

Mistral raises 1.7B€, partners with ASML (mistral.ai)

“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

Chat Control Must Be Stopped (privacyguides.org)

Almost anything you give sustained attention to will begin to loop on itself (henrikkarlsson.xyz)

New Mexico is first state in US to offer universal child care (governor.state.nm.us)

Where's the shovelware? Why AI coding claims don't add up (mikelovesrobots.substack.com)

Google AI Overview made up an elaborate story about me (bsky.app)

I didn't bring my son to a museum to look at screens (sethpurcell.com)

Claude Code: Now in Beta in Zed (zed.dev)

Eternal Struggle (yoavg.github.io)

iPhone dumbphone (stopa.io)

I'm absolutely right (absolutelyright.lol)

ICE is using fake cell towers to spy on people's phones (forbes.com)

Claude now has access to a server-side container environment (anthropic.com)

LLM Visualization (bbycroft.net)

Notes on Managing ADHD (borretti.me)

Serverless Horrors (serverlesshorrors.com)

MIT Study Finds AI Use Reprograms the Brain, Leading to Cognitive Decline (publichealthpolicyjournal.com)

The maths you need to start understanding LLMs (gilesthomas.com)

No adblocker detected (maurycyz.com)

AI surveillance should be banned while there is still time (gabrielweinberg.com)

Wikipedia survives while the rest of the internet breaks (theverge.com)

Fil's Unbelievable Garbage Collector (fil-c.org)

Pontevedra, Spain declares its entire urban area a "reduced traffic zone" (greeneuropeanjournal.eu)

E-paper display reaches the realm of LCD screens (spectrum.ieee.org)

Anthropic raises $13B Series F (anthropic.com)

We already live in social credit, we just don't call it that (thenexus.media)

Google: 'Your $1000 phone needs our permission to install apps now' [video] (youtube.com)

Show HN: Term.everything – Run any GUI app in the terminal (github.com)

14 Killed in anti-government protests in Nepal (tribuneindia.com)

Immich – High performance self-hosted photo and video management (github.com)

A staff engineer's journey with Claude Code (sanity.io)

Bear is now source-available (herman.bearblog.dev)

Show HN: Theory of Mind benchmark for 8 LLMs with reproducible markers

Comments (1)