Gradient Descent on Token Input Embeddings
2 kp1197 1 7/18/2025, 6:15:53 PM lesswrong.com ↗
Comments (1)
kp1197 · 5h ago
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?