Anthropic's Circuit Tracer

2 michaelmarkell 1 5/31/2025, 3:09:39 PM github.com ↗

Comments (1)

michaelmarkell · 1d ago
From the Readme:

Given a model with pre-trained transcoders, it finds the circuit / attribution graph; i.e., it computes the direct effect that each non-zero transcoder feature, transcoder error node, and input token has on each other non-zero transcoder feature and output logit. Given an attribution graph, it visualizes this graph and allows you to annotate these features. Enables interventions on a model's transcoder features using the insights gained from the attribution graph; i.e. you can set features to arbitrary values, and observe how model output changes.

The blog post: https://www.anthropic.com/research/open-source-circuit-traci...