TSCE and HyperDimensional Anchors: Making AI agents/workflows reliable at scale

2 airylizard 1 5/5/2025, 8:50:34 PM github.com ↗

Comments (1)

airylizard · 18h ago
1. What TSCE is in one breath

Two deterministic forward-passes.

1. The model is asked to emit a hyperdimensional anchor (HDA) under high temperature. 2. The same model is then asked to answer while that anchor is prepended to the original prompt.

No retries, no human-readable scratch-pad, no fine-tune.

---

2. What a hyper-dimensional anchor is

Opaque token sequence that the network writes for itself.

Notation: • X = full system + user prompt • A = anchor tokens • Y = final answer

Phase 1 samples `A ~ pθ(A | X)` Phase 2 samples `Y ~ pθ(Y | X,A)`

Because A is now a latent variable observed at inference time:

`H(Y | X,A) ≤ H(Y | X)` (entropy can only go down) and, empirically, E\[H] drops ≈ 6× on GPT-3.5-turbo.

Think of it as the network manufacturing an internal coordinate frame, then constraining its second pass to that frame.

---

3. Why the anchor helps (intuition, not hype)

4 096-D embeddings can store far more semantic nuances than any single “chain-of-thought” token stream. The anchor is generated under the same system policy that will govern the answer, so policy constraints are rehearsed privately before the model speaks. Lower conditional entropy means fewer high-probability “wrong” beams, so a single low-temperature decode often suffices.

---

4. Numbers (mixed math + calendar + formatting pack)

GPT-3.5-turbo – accuracy 49 % → 79 % (N = 300). GPT-4.1 – em-dash violation 50 % → 6 % (N = 300). Llama-3 8 B – accuracy 69 % → 76 % with anchor alone, 85 % when anchor precedes chain-of-thought (N = 100). Token overhead: 1.3 – 1.9× (two calls). One Self-Refine loop already costs ≥ 3×.

Diagnostic plots (entropy bars, KL-per-position, cosine-distance violin) are in the repo if you like pictures → `figures/`.

---

5. Why this isn’t “just another prompt trick”

The anchor never appears in the user-visible text. Gains replicate on two vendor families (OpenAI GPT and open-weights Llama) and on both reasoning and policy-adherence tasks. Visible chain-of-thought actually loses accuracy on 8 B models unless the anchor comes first; the mechanism changes internal computation, not surface wording.

---

6. Try it yourself

pip install tsce python -m tsce_demo "Rewrite this sentence without any em-dashes — can you?"

Repo (MIT) with benchmark harness, plots, and raw JSONL in Title!

---

7. Questions I’d love feedback on

Optimal anchor length vs. model size (64 tokens seems enough for < 10 B). Behaviour on Mixtral, Phi-3, Claude, Gemini — please post numbers. Red-team attempts: can you poison the anchor in Phase 1 and make the answer leak?

---