Why do we still flatten embedding spaces?

5 Intrinisical-AI 1 7/20/2025, 8:07:49 PM
ost dense retrieval systems rely on cosine similarity or dot-product, which implicitly assumes a flat embedding space. But embedding spaces often live on curved manifolds with non-uniform structure—dense regions, semantic gaps, asymmetric paths.

I’ve been exploring the use of:

- Ricci curvature as a reranking signal

- Soft-graphs to preserve local density

- Geodesic-aware losses during training

Curious if others have tried anything similar? Especially in information retrieval, QA, or explainability. Happy to share some experiments (FiQA/BEIR) if there's interest.

Comments (1)

PaulHoule · 14h ago
I've been bothered by this since before there were transformers.

Probably the most interesting function over t is G(t), that function Chomsky said was the grammar in that it is true if t is well-formed and false if it isn't.

G(t) over t is not a manifold because it is not continuous and its projection in the embedding space can't be continuous either. It boggles my mind, and leaves me thinking that it's not legitimate to work in the embedding space but it obviously works.

If you have two points in the embedding space which represent well-formed sequences and draw a line that interpolates between them you'd think that there would have to be points in between that correspond to ill-formed sequences. Intuition over high dimensional spaces is problematic, but I imagine there have to be structures in there that "look" like a crumpled up ball of 2-d paper in a 3-d space or are folded up like filo dough.