Show HN: Provability Fabric – From Guardrails to Guarantees for AI Agents

3 MADEinPARIS 1 8/17/2025, 9:59:06 PM
Show HN: Provability Fabric — From Guardrails to Guarantees for AI Agents https://mateopetel.substack.com/p/provability-fabric-the-saf...

Most AI “safety” today is just vibes. Prompts get hardened, heuristics stacked, dashboards blink green. But when agents can call tools, stream outputs, or mutate state, vibes don’t scale. You need math wired into the runtime.

Provability Fabric (PF) is a framework for proof-carrying behavior:

Spec-to-Proof layer → ActionDSL policies compile to Lean 4 obligations + runtime monitors

Provable Agent Bundles (PABs) → signed packages binding spec, proofs, SBOM, provenance

Rust sidecar → complete mediation of all events (call, ret, log, declassify, emit_chunk, emit_end)

Egress certificates → every output gets a signed verdict (pass/fail/inapplicable)

TRUST-FIRE test suite → stress tests for chaos, rollback, privacy burn-down, adversarial adapters

The core bet:

“Safety isn’t a setting. It’s a correspondence proof between your semantics and your runtime.”

Would love HN’s take:

Is this the right way to move beyond vibes toward falsifiable guarantees?

Should proof-carrying bundles (like PAB-1.0) become the standard for deploying agents?

What would it take for you to trust an agent in prod — proofs, monitors, or something else?

Comments (1)

cssanchez · 6h ago
This makes sense if you think models will continue their current architecture and will always generate random predictions full of hallucinations. I see the next leap forward in SOA models with someone finding the right way to architect a Logic engine and a Knowledge engine necessary to implement the Provability Fabric you propose. In short, I would say a knowledge kind of bloom filter should weed out factual hallucinations, while a logic one will solve the "how many Rs in strawberry" problem.

Most likely a SOA model will implement a kind of provability fabric to set up guaranteed output rails. But if we implement this today, for example, let's say we tell a model to classify input into one of 3 set categories, the PF will guarantee the model will only output one of those three options, but it can still "hallucinate" by picking a bad category using faulty logic, breaking the rules, etc.