I’m still waiting for somebody to explain to me how a model with a million+ parameters can ever be interpretable in a useful way. You can’t actually understand the model state, so you’re just making very coarse statistical associations between some parameters and some kinds of responses. Or relying on another AI (itself not interpretable) to do your interpretation for you. What am I missing?
CGMthrowaway · 18h ago
There is a power law curve to the importance of any particular feature. I work with models with 1000's of features and usually it's only the top 5-10 that really matter. But you don't know until you do it
esafak · 19h ago
Even a large model has to behave fairly predictably to be useful; it's not totally random, is it? The same thing applies to humans.
My take is the model is a matrix (or a thing like a matrix). You can "interpret" it in the context of another matrix that you know (presumably by generating that matrix from known training data, or by looking at the delta between different matrices with different measurable output behavior), you can say how much of your test matrix is present in the target model.
juancroldan · 20h ago
Cool stuff. I'm the CTO of Stargazr (stargazr.ai), a financial & operational AI for manufacturing companies; we started using transformers to process financial data in 2020, a bit before the GPT boom.
In our experience, things beyond very constrained function calling opens the door to explainability problems. We moved away from "based on the embeddings of this P&L, you should do X" towards "I called a function to generate your P&L, which is in this table; based on this you could think of applying these actions".
It's a loss in terms of semantics (the embeddings could pack more granular P&L observations over time) but much better in terms of explainability. I see other finance AIs such as SAP Joule also going in the same direction.
ashater · 20h ago
Thank you. Agreed, we are exploring different ways to apply these interpretability methods to a wide range of transformer based methods, not just decoder based generative applications.
vessenes · 19h ago
Ooh you had me at mechinterp + finance. Thanks for publishing: I’m excited to read it. Long term do you guys hope to uncover novel frameworks? Or are you most interested in having a handle on what’s going on inside the model?
ashater · 19h ago
We want to do both. In finance, highly regulated industry, understanding how models work is critical. In addition, mech interp will allow us to understand which current or new architectures could work better for financial applications.
laylower · 19h ago
Thanks Ariye. What does group risk think about this paper?
I imagine these metrics would be good to include in the MI but are you confident that the methods being proposed are adequate to convince regulators on both sides of the Atlantic?
ashater · 19h ago
Thank you for reading. One of the main reasons we've written the paper is to help with model validation of LLM usage in our highly regulated industry. We are also engaging with regulators.
The industry at the moment is mostly using closed sourced vendor models that are very hard to validate or interpret. We are pushing to move onto models, with open source weights and where we can apply our interpretability methods.
Current validation approaches are still very behavioral in nature and we want move it into mechanistic interpretation world.
ashater · 22h ago
Paper introduces AI explainability methods, mechanistic interpretation, and novel Finance-specific use cases. Using Sparse Autoencoders, we zoom into LLM internals and highlight Finance-related features. We provide examples of using interpretability methods to enhance sentiment scoring, detect model bias, and improve trading applications.
Interpretability can mean several things. Are you familiar with things like this? https://distill.pub/2018/building-blocks/
Monosemantic behavior is key in our research.
In our experience, things beyond very constrained function calling opens the door to explainability problems. We moved away from "based on the embeddings of this P&L, you should do X" towards "I called a function to generate your P&L, which is in this table; based on this you could think of applying these actions".
It's a loss in terms of semantics (the embeddings could pack more granular P&L observations over time) but much better in terms of explainability. I see other finance AIs such as SAP Joule also going in the same direction.
I imagine these metrics would be good to include in the MI but are you confident that the methods being proposed are adequate to convince regulators on both sides of the Atlantic?
The industry at the moment is mostly using closed sourced vendor models that are very hard to validate or interpret. We are pushing to move onto models, with open source weights and where we can apply our interpretability methods.
Current validation approaches are still very behavioral in nature and we want move it into mechanistic interpretation world.
No comments yet