SigNoz (YC W21, Open Source Datadog) Is Hiring DevRel Engineers (Remote)(US) (ycombinator.com)

1 points by pranay01 1h ago 0 comments

AccessOwl (YC S22) is hiring an Elixir Engineer to connect 100s of SaaS (ycombinator.com)

1 points by mathiasn 13h ago 0 comments

FurtherAI (YC W24) Is Hiring for Software and AI Roles (ycombinator.com)

1 points by sgondala_ycapp 1d ago 0 comments

Yarn (YC W24) is hiring engineers in NYC (ycombinator.com)

1 points by jasperstory 1d ago 0 comments

Expand.ai (YC S24) is hiring a founding engineer

1 points by timsuchanek 2d ago 0 comments

Optifye.ai (YC W25) is hiring a back end engineer

1 points by Vivaan_Baid 3d ago 0 comments

Kastle (S24) is hiring an engineer (ycombinator.com)

1 points by rishi443 3d ago 0 comments

Weave (YC W25) is hiring a founding AI engineer (ycombinator.com)

1 points by adchurch 5d ago 0 comments

Qfex (YC X25) – Back End Engineer for a 24/7 Stock Exchange (ycombinator.com)

1 points by NPDW 6d ago 0 comments

points by 7d ago 0 comments

Attimet (YC F24) – Quant Trading Research Lab – Is Hiring Founding Engineer (ycombinator.com)

1 points by kbanothu 8d ago 0 comments

Jiga (YC W21) Is Hiring Software Engs to Make Life of Mech Engs Easier (workatastartup.com)

1 points by grmmph 8d ago 0 comments

Foundry (YC F24) Hiring Early Engineer to Build Web Agent Infrastructure (ycombinator.com)

1 points by lakabimanil 8d ago 0 comments

Blaze (YC S24) Is Hiring (ycombinator.com)

1 points by faiyamrahman 9d ago 0 comments

Infracost (YC W21) is hiring software engineers (GMT+2 to GMT-6) (infracost.io)

1 points by aliscott 10d ago 0 comments

Solidroad (YC W25) Is Hiring (solidroad.com)

1 points by pjfin 12d ago 0 comments

Kyber (YC W23) Is Hiring a Technical Account Manager (ycombinator.com)

1 points by asontha 13d ago 0 comments

Roundtable (YC S23) Is Hiring a President / CRO (ycombinator.com)

1 points by timshell 13d ago 0 comments

Roame (YC S23) Is Hiring (ycombinator.com)

1 points by zman0225 14d ago 0 comments

GauntletAI (YC S17): All expenses paid AI training and guaranteed $200k+ job (gauntletai.com)

1 points by austenallred 14d ago 0 comments

SchemeFlow (YC S24) Is Hiring an Engineer (London) to Speed Up Construction (ycombinator.com)

1 points by andrewkinglear 14d ago 0 comments

Shaped (YC W22) Is Hiring (ycombinator.com)

1 points by tullie 14d ago 0 comments

Spice Data (YC S19) is hiring a software engineer – back end (ycombinator.com)

1 points by richard_pepper 15d ago 0 comments

Onlook (YC W25) Is Hiring an engineer in SF

1 points by D_R_Farrell 16d ago 0 comments

OneText (YC W23) Is Hiring a DevOps/DBA Lead Engineer (jobs.ashbyhq.com)

1 points by bluepnume 19d ago 0 comments

Gander (YC F24) Is Hiring Founding Engineers and Interns (ycombinator.com)

1 points by arjanguglani 19d ago 0 comments

Ziina (YC W21) the Series A fintech is hiring product engineers (ziina.notion.site)

1 points by faisaltoukan 19d ago 0 comments

Onyx (YC W24) – AI Assistants for Work Hiring Founding AE (ycombinator.com)

1 points by yuhongsun 19d ago 0 comments

Great Question (YC W21) Is Hiring a Director of Customer Success (ycombinator.com)

1 points by nedwin 20d ago 0 comments

Deepnote (YC S19) is hiring engineers to build an AI-powered data notebook (deepnote.com)

1 points by Equiet 20d ago 0 comments

Converge (YC S23) Well-capitalized New York startup seeks product developers (runconverge.com)

1 points by thomashlvt 20d ago 0 comments

Muvera: Making multi-vector retrieval as fast as single-vector search

72 georgehill 5 6/26/2025, 10:29:34 AM research.google ↗

Comments (5)

nighthawk454 · 26m ago

Seems to be a trend away from mean-pooling into a single embedding. But instead of dealing with an embedding per token (lots) you still want to reduce it some. This method seems to cluster token embeddings by random partitioning, mean pool for each partition, and concatenate the resulting into a fixed-length final embedding.

Essentially, full multi vector comparison is challenging performance wise. Tools and performance for single vectors are much better. To compromise, cluster into k chunks and concatenate. Then you can do k-vector comparison at once with single-vector tooling and performance.

Ultimately the fixed length vector comes from having a fixed number of partitions, so this is kind of just k-means style clustering of the token level embeddings.

Presumably a dynamic clustering of the tokens could be even better, though that would leave you with a variable number of embeddings per document.

trengrj · 5h ago

We added Muvera to Weaviate recently https://weaviate.io/blog/muvera and also have a nice podcast on it https://www.youtube.com/watch?v=nSW5g1H4zoU.

When looking at multi-vector / ColBERT style approaches, the embedding per token approach can massively increase costs. You might go from a single 768 dimension vector to 128 x 130 = 16,640 dimensions. Even with better results from a multi-vector model this can make it unfeasible for many use-cases.

Muvera, converts the multiple vectors into a single fixed dimension (usually net smaller) vector that can be used by any ANN index. As you now have a single vector you can use all your existing ANN algorithms and stack other quantization techniques for memory savings. In my opinion it is a much better approach than PLAID because it doesn't require specific index structures or clustering assumptions and can achieve lower latency.

bobosha · 2h ago

how is this different from generating a feature hash of the embeddings i.e reduce from many to one embedding reduction? Could a UMAP or such technique be helpful in reducing to a single vector?

dinkdonkbell · 1h ago

UMAP doesn't project values into the same coordinate space. While the abstract properties are the same between projections, where it projects it to in coordinate space won't be the same.

dinobones · 3h ago

So this is basically an “embedding of embeddings”, an approximation of multiple embeddings compressed into one, to reduce dimensionality/increase performance.

All this tells me is that: the “multiple embeddings” are probably mostly overlapping and the marginal value of each additional one is probably low, if you can represent them with a single embedding.

I don’t otherwise see how you can keep comparable performance without breaking information theory.