Show HN: PTS Library – Analyze LLM reasoning through "thought anchors"

2 codelion 0 7/23/2025, 4:09:58 AM

I built PTS (Pivotal Token Search), an open-source library for mechanistic interpretability analysis of language models. The core feature is generating "thought anchors" - identifying which specific sentences in a model's reasoning chain significantly impact task success.

What it does:

- Generates chain-of-thought reasoning traces from any LLM

- Uses counterfactual analysis to measure impact of each reasoning step

- Identifies critical sentences that make-or-break task completion

- Exports semantic embeddings for clustering analysis

- Provides systematic failure mode categorization

Example use case:

I used PTS to compare Qwen3-0.6B vs DeepSeek-R1-Distill-1.5B on math problems and discovered they have fundamentally different reasoning architectures:

- DeepSeek: concentrated reasoning (fewer, high-impact steps)

- Qwen3: distributed reasoning (impact spread across multiple steps)

Quick start:

# Generate thought anchors

pts run --model="your-model" --dataset="gsm8k" --generate-thought-anchors

# Export for analysis

pts export --format="thought_anchors" --output-path="analysis.jsonl"

The library implements the thought anchors methodology from Bogdan et al. (2025) with extensions for:

- Comprehensive metadata collection

- 384-dimensional semantic embeddings

- Causal dependency tracking

- Systematic failure analysis

Why this matters: Most interpretability tools focus on individual tokens or attention patterns. Thought anchors operate at the sentence level, revealing which complete reasoning steps actually matter for getting correct answers.

Limitations: Currently focused on mathematical reasoning tasks. Planning to extend to other domains and larger models.

Links:

- GitHub: https://github.com/codelion/pts

- Research example: https://huggingface.co/blog/codelion/understanding-model-rea...

- Generated datasets: Available on HuggingFace

Would appreciate feedback on extending this to other reasoning domains or interpretability approaches.

New data shows tape is still not dead (theregister.com)

Apply to host an event at Qiskit Fall Fest 2025 (ibm.com)

Yes, the Book of PF, Fourth Edition Is Coming Soon (bsdly.blogspot.com)

AI-Powered Video Creation (workspace.google.com)

Funding for program to stop next Stuxnet from hitting US expired Sunday (theregister.com)

Big Tech enters the war business (english.elpais.com)

Qwen3‑Coder Unleashed – Agentic Coding's New Powerhouse (algogist.com)

Show HN: E2EE Messaging with a Decentralized Microfrontend Architecture (positive-intentions.com)

Why is it so hard to export Markdown from Gemini? (sundaystopwatch.eu)

Cerebras Launches Qwen3-235B, Achieving 1,500 Tokens per Second (cerebras.ai)

How the Application and Request Contexts Work in Python Flask (blog.appsignal.com)

Victim of an NFT Scam or Cryptocurrency Investment Fraud? Take Action Now

Short Google (tompccs.github.io)

Show HN: Limit – Android content blocker which can't be bypassed (limitphone.com)

We built fast UPDATEs for ClickHouse – Part 1: Purpose-built engines (clickhouse.com)

A minimal ASCII art editor, place characters like pixels in a grid (glypheditor.com)

Chinese Car Giants Rush into Brazil with Dreams of Dominating a Continent (nytimes.com)

Nearly 50% of the container images misconfigure the main process (PID 1) (twitter.com)

Show HN: Made my first iOS app free offline currency converter (apps.apple.com)

China Flexes Muscles at U.N. Cultural Agency, Just as Trump Walks Away (nytimes.com)

Unsloth – Dynamic 4-bit Quantization (unsloth.ai)

Lumo, the AI where every conversation is confidential (proton.me)

How ant queens are made (rockefeller.edu)

Open Sauce is a confoundingly brilliant Bay Area event (jeffgeerling.com)

What is X-Forwarded-For and when can you trust it? (2024) (httptoolkit.com)

Has Brazil Invented the Future of Money? (paulkrugman.substack.com)

I tried vibe coding for 30 days (YouTube) (youtube.com)

AI Sandboxes: Daytona vs. Microsandbox (pixeljets.com)

A new TUI for managing app store reviews (github.com)

SV AI Startups Are Embracing China's Controversial '996' Work Schedule (wired.com)

Choosing the rijght .NET container image for your workload (medium.com)

Show HN: Breakout game from GitHub contributions graph (github.com)

Sparse Matrix Library with Compressed Row Storage (github.com)

Checking Out CPython 3.14's remote debugging protocol (rtpg.co)

Firefox 141 Released (firefox.com)

Ask HN: What's the Next AI Trend?

Apkbuild strict – utility to parse and validate APKBUILD files (github.com)

I built a compiler in C from scratch with lexer, parser, and C codegen (github.com)

Teens say they are turning to AI for advice, friendship 'to get out of thinking' (aol.com)

Musk is messing with Cosmic Dawn. Will alien hunters save the day for mankind? (theregister.com)

New YugabyteDB Functionality for Ultra-Resilient AI Apps (yugabyte.com)

KeePassXC two factor authentification suddenly fails everywhere

Show HN: Chrome Extension – Edit webapp internationalization keys in context (prismy.io)

Is Your Car a Moving Data Center Yet? (spectrum.ieee.org)

New research says AI Overviews cause drop in search clicks (arstechnica.com)

The New Hot Topic in European Politics Is Air Conditioning (wsj.com)

YajuzhenCloudPhone: AddressContent Shadowban Issues for Overseas TikTok Accounts

I built a privacy-first chat platform–now with"Continue Chat"(stranger-meet.com) (stranger-meet.com)

Brave blocks Microsoft Recall by default (brave.com)

HTML Day 2025 (twitter.com)

Show HN: PTS Library – Analyze LLM reasoning through "thought anchors"

Comments (0)