It seems a bit much to stick a Proper Noun in front of Neural Networks and call it a new paradigm.
I can see how that worked for KANs because weights and activations are the bread and butter of Neural networks. Changing the activations kind-of does make a distinct difference. I still thing there's merit in having learnable weights and activations together, but that's not very Kolmogorov Arnold theorem, so activations only seemed like a decent start point (but I digress).
This new thing seems more like just switching out one bit of the toolkit for another. There are any number of ways to measure how a bunch of values are like another bunch of values. Cosine similarity, despite sounding all intellectual is just a dot product wearing a lab coat and glasses. I assume it is easily acknowledged as not the best metric, but really can't be beat for performance if you have a lot of multiply units lying around.
It would be worth combining this research with the efforts on translating one embedding model to another. Transferring between metrics might allow you to pick the most appropriate one at specific times.
roger_ · 2h ago
Interesting, can this be applied to regression?
dkdcio · 6h ago
> Another useful property of the model is interpretability.
Is this true? my understanding is the hard part about interpreting neural networks is that there are many many neurons, with many many interconnections, not that the activation function itself is not explainable. even with an explainable classifier, how do you explain trillions of them with deep layers of nested connections
abeppu · 5h ago
I think the case for interpretability could have been made better, but in Figure 3 I think if you look at the middle "prototype" rows from the traditional vs Tversky layers, and scroll so you can't see the rows above, I think you could pick out mostly which Tversky prototype corresponds to each digit, but not which traditional/linear prototype corresponds to each digit.
So I do think that's more interpretable in two ways:
1. You can look at specific representations in the model and "see" what they "mean"
2. This means you can give a high-level interpretation to a particular inference run: "X_i is a 7 because it's like this prototype that looks like a 7, and it has some features that only turn up in 7s"
I do think complex models doing complex tasks will sometimes have extremely complex "explanations" which may not really communicate anything to a human, and so do not function as an explanation.
bobmarleybiceps · 6h ago
I've decided 100% of papers saying their modification of a neural network is interpretable are exaggerating.
heyitsguay · 8h ago
Seems cool, but the image classification model benchmark choice is kinda weak given all the fun tools we have now. I wonder how Tversky probes do on top of DINOv3 for building a classifier for some task.
throwawaymaths · 7h ago
crawl walk run.
no sense spending large amounts of compute on algorithms for new math unless you can prove it can crawl.
heyitsguay · 7h ago
It's the same amount of effort benchmarking, just a better choice of backbone that enables better choices of benchmark tasks. If the claim is that a Tversky projection layer beats a linear projection layer today, then one can test whether that's true with foundation embedding models today.
It's also a more natural question to ask, since building projections on top of frozen foundation model embeddings is both common in an absolute sense, and much more common, relatively, than building projections off of tiny frozen networks like a ResNet-50.
I can see how that worked for KANs because weights and activations are the bread and butter of Neural networks. Changing the activations kind-of does make a distinct difference. I still thing there's merit in having learnable weights and activations together, but that's not very Kolmogorov Arnold theorem, so activations only seemed like a decent start point (but I digress).
This new thing seems more like just switching out one bit of the toolkit for another. There are any number of ways to measure how a bunch of values are like another bunch of values. Cosine similarity, despite sounding all intellectual is just a dot product wearing a lab coat and glasses. I assume it is easily acknowledged as not the best metric, but really can't be beat for performance if you have a lot of multiply units lying around.
It would be worth combining this research with the efforts on translating one embedding model to another. Transferring between metrics might allow you to pick the most appropriate one at specific times.
Is this true? my understanding is the hard part about interpreting neural networks is that there are many many neurons, with many many interconnections, not that the activation function itself is not explainable. even with an explainable classifier, how do you explain trillions of them with deep layers of nested connections
So I do think that's more interpretable in two ways:
1. You can look at specific representations in the model and "see" what they "mean"
2. This means you can give a high-level interpretation to a particular inference run: "X_i is a 7 because it's like this prototype that looks like a 7, and it has some features that only turn up in 7s"
I do think complex models doing complex tasks will sometimes have extremely complex "explanations" which may not really communicate anything to a human, and so do not function as an explanation.
no sense spending large amounts of compute on algorithms for new math unless you can prove it can crawl.
It's also a more natural question to ask, since building projections on top of frozen foundation model embeddings is both common in an absolute sense, and much more common, relatively, than building projections off of tiny frozen networks like a ResNet-50.