Gradient Descent on Token Input Embeddings

2 kp1197 1 7/18/2025, 6:15:53 PM lesswrong.com ↗

Comments (1)

kp1197 · 5h ago
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?