Transformers as Multi-Task Learners: Decoupling Features in Hidden Markov Models

2 badmonster 1 6/3/2025, 6:17:37 AM arxiv.org ↗

Comments (1)

badmonster · 22h ago
interesting paper. essentially a peek under the hood of why transformers generalize so well across multiple tasks, through the lens of hidden markov models.