The Theoretical Limitations of Embedding-Based Retrieval

37 fzliu 1 8/29/2025, 8:25:34 PM arxiv.org ↗

Comments (1)

gdiamos · 27m ago
Their idea is that capacity of even 4096-wide vectors limits their performance.

Sparse models like BM25 have a huge dimension and thus don’t suffer from this limit, but they don’t capture semantics and can’t follow instructions.

It seems like the holy grail is a sparse semantic model. I wonder how splade would do?