Show HN: PTS Library – Analyze LLM reasoning through "thought anchors"

2 codelion 0 7/23/2025, 4:09:58 AM
I built PTS (Pivotal Token Search), an open-source library for mechanistic interpretability analysis of language models. The core feature is generating "thought anchors" - identifying which specific sentences in a model's reasoning chain significantly impact task success.

What it does:

- Generates chain-of-thought reasoning traces from any LLM

- Uses counterfactual analysis to measure impact of each reasoning step

- Identifies critical sentences that make-or-break task completion

- Exports semantic embeddings for clustering analysis

- Provides systematic failure mode categorization

Example use case:

I used PTS to compare Qwen3-0.6B vs DeepSeek-R1-Distill-1.5B on math problems and discovered they have fundamentally different reasoning architectures:

- DeepSeek: concentrated reasoning (fewer, high-impact steps)

- Qwen3: distributed reasoning (impact spread across multiple steps)

Quick start:

# Generate thought anchors

pts run --model="your-model" --dataset="gsm8k" --generate-thought-anchors

# Export for analysis

pts export --format="thought_anchors" --output-path="analysis.jsonl"

The library implements the thought anchors methodology from Bogdan et al. (2025) with extensions for:

- Comprehensive metadata collection

- 384-dimensional semantic embeddings

- Causal dependency tracking

- Systematic failure analysis

Why this matters: Most interpretability tools focus on individual tokens or attention patterns. Thought anchors operate at the sentence level, revealing which complete reasoning steps actually matter for getting correct answers.

Limitations: Currently focused on mathematical reasoning tasks. Planning to extend to other domains and larger models.

Links:

- GitHub: https://github.com/codelion/pts

- Research example: https://huggingface.co/blog/codelion/understanding-model-rea...

- Generated datasets: Available on HuggingFace

Would appreciate feedback on extending this to other reasoning domains or interpretability approaches.

Comments (0)

No comments yet