The AI Safety Puzzle Everyone Avoids: How to Measure Impact, Not Intent

Comments (1)

patrick0d · 3d ago

I am an AI interpretability researcher and have a new proposal for a way to measure the per token contribution of each head and neuron in LLMs. I found that the normalisation that happens in every LLM is avoided by modern attribution methods despite it having a large impact on the model's computation.

Here is the full preprint paper and the code I used. https://github.com/patrickod32/landed_writes Happy to some insight from any interested people and would like to know if other people here have been working on anything similar. This seems like a real gap in the research to me.

Ask HN: How many of you are working in tech without a STEM degree?

Ask HN: What now occupies a web domain you used to own?

Ask HN: What is the average karma points of HN users?

FreeToolSuite – 200 growing collection of genuinely useful free online tools

Ask HN: Can anyone here confidently say they've been replaced by AI?

Ask HN: Who is looking for a cofounder in London?

why art will survive AI

Ask HN: Why are ePub images so tiny?

Ask HN: What is so good about MCP servers?

Ask HN: Why is virtualization still not solved?

Ask HN: Why did my free tools site with 600 tools make almost $0 after 2 years?

Ask HN: How you network in NYC as a founder from out of town

Ask HN: How do you name your product? Are there any standardized ways?

Ask HN: Why do Cursor, Windsurf and Claude Code dominate the conversation?

Ask HN: Claude Code–style agent, but Aider-like and model-agnostic?

I'm Peter Roberts, immigration attorney who does work for YC and startups. AMA

Remove All AI Features from Firefox

Ask HN: Why is Gmail so incompetent at basic search?

Ask HN: Python developers at big companies what is your setup?

Ask HN: Help me navigate a PIP at a remote startup in the Netherlands

Ask HN: Good Starting OS for Children?

Ask HN: Cyber Resilience Act – what is "buying" software?

Ask HN: Any active COBOL devs here? What are you working on?

Recovering Files Through Screenshots

Tell HN: Online Safety Act to be enforced in the UK on July 25th

Ask HN: What's Your Useful Local LLM Stack?

The AI Safety Puzzle Everyone Avoids: How to Measure Impact, Not Intent

Comments (1)