Muvera: Making multi-vector retrieval as fast as single-vector search

Comments (1)

trengrj · 1h ago

We added Muvera to Weaviate recently https://weaviate.io/blog/muvera and also have a nice podcast on it https://www.youtube.com/watch?v=nSW5g1H4zoU.

When looking at multi-vector / ColBERT style approaches, the embedding per token approach can massively increase costs. You might go from a single 768 dimension vector to 128 x 130 = 16,640 dimensions. Even with better results from a multi-vector model this can make it unfeasible for many use-cases.

Muvera, converts the multiple vectors into a single fixed dimension (usually net smaller) vector that can be used by any ANN index. As you now have a single vector you can use all your existing ANN algorithms and stack other quantization techniques for memory savings. In my opinion it is a much better approach than PLAID because it doesn't require specific index structures or clustering assumptions and can achieve lower latency.

A Nobel Prize winner decodes why people aren't having kids (washingtonpost.com)

A world where subscriptions don't suck (nuggetize.com)

A Review of Aerospike Nozzles: Current Trends in Aerospace Applications (mdpi.com)

FLUX.1 Kontext [Dev] – Open Weights for Image Editing (bfl.ai)

Cross-Compiling 10k Rust CLI Crates Statically (blog.pkgforge.dev)

Introduction to embedded development with Rust: Overview of the ecosystem (kerkour.com)

Field Guide to the North American Weigh Station (hackaday.com)

Log-Survival to Death Rate (entropicthoughts.com)

FLUX Kontext Dev Ultra Fast Live (huggingface.co)

Show HN: Personal Branding Photos for Women in 10 mins (gostudio.ai)

Tennis Scorigami (tennis-scorigami.com)

Young People Face a Hiring Crisis. AI Is Making It Worse (derekthompson.substack.com)

KDE Plasma 6.4 review – A worrying trend (dedoimedo.com)

Ask HN: How can I promote my Free Startup Mentoring?

My favorite account is a library in Ohio (milkkarten.net)

Context Engineering: A Primer (ai.intellectronica.net)

Lounge It (loungeit.carrd.co)

Show HN: I built a UI to manage Claude Code worktrees (github.com)

Mysite.ai – your first AI employee that builds websites in under 2 minutes (mysite.ai)

Convert Words to Time (wordstotime.com)

Beyond the Buzz: How 'Deep Tech' Startups Are Changing the Game (fossforce.com)

ProductHunt Isn't the Place for Indie Devs Anymore (old.reddit.com)

Weird Expressions in Rust (wakunguma.com)

Why Most SBOMs Fail and What to Do About It (ovalenzuela.com)

Tech AI is doing 30%-50% of the work at Salesforce, CEO Marc Benioff says (cnbc.com)

LLM Code Review Maven Plugin (github.com)

The CrateDB MCP Server (community.cratedb.com)

Built a tool to help SaaS teams use their customer feedback (inov-ai.tech)

European Venture Prisoner's Dilemma (nodesventures.substack.com)

Offerwall gives publishers more options to monetize content (blog.google)

Gridlocked: AI's power needs could short-circuit US infrastructure (theregister.com)

The cheat codes of technological progress (exponentialview.co)

Show HN: An open-source app to query 10 AI models at once (github.com)

Modern Node.js Patterns for 2025 (kashw1n.com)

Gen4D: Synthesizing Humans and Scenes in the Wild (arxiv.org)

Fiscal Year 2026 Budget Request (nasa.gov)

Show HN: I built an AI dataset generator (github.com)

Most Influential Companies 2025 (time.com)

The Engineering Skill Points Game (mixpanel.substack.com)

Part 2: An AI swim coach that's building an iOS trainer in Swift (suthakamal.substack.com)

Using LLMs in CI/CD for semantic testing of web content (plo.ug)

CSS Functions and Mixins Module (w3.org)

Show HN: I made a Resume Generator – includes tailoring, scoring, and more (resumebuildai.com)

Weight loss jabs study begins after reports of pancreas issues (bbc.com)

Do Stonks Go Up? (entropicthoughts.com)

Bring on the Stupid: When does it make sense to judge a something by its worst? (statmodeling.stat.columbia.edu)

Top AI Tools for Business (aiex.me)

Lofi Byzantine Chant Radio (orthodox.cafe)

The Decline of Legacy Media, Rise of Vodcasters, and X's Staying Power (conspicuouscognition.com)

DockedUp: A Terminal Dashboard for Docker Containers (github.com)

Muvera: Making multi-vector retrieval as fast as single-vector search

Comments (1)