Ask HN: MCP/API search vs. vector search – what's winning for you?

1 ngkw 0 8/20/2025, 1:04:40 AM
TL;DR: I have a hunch that demand for classic RAG (embeddings + vector DB) will shrink. Reasons:

1. Embedding ops cost (re-indexing, freshness) is high.

2. LLMs are getting good at iterative query expansion over plain search APIs (BM25-style).

3. Embedding quality is still uneven across domains/languages. Curious what you are actually seeing in production.

Context: We’re a \~10-person team inside a large company. People use different UIs (ChatGPT, Claude, Dify, etc.). Cost/security aren’t our main issues; we just want higher throughput. We can wire MCP-style connectors (Notion/Slack/Drive) or run our own vector index—trying to pick battles that really move the needle.

Hypotheses I’m testing:

* For fast-changing corp knowledge, BM25 + LLM query expansion + light re-ranking beats maintaining a vector store (lower ops, decent recall).

* MCP/API search gives “good enough” docs if you union a few expanded queries and re-rank.

* Vectors still win for long-tail semantic matches and noisy phrasing—but only when content is relatively stable or you can afford frequent re-embeds.

What I want from HN (war stories, not vendor pitches):

1. Have you sunset or avoided vector DBs because ops/freshness pain outweighed gains? What were the data size, update rate, and latency targets?

2. If you kept vectors, what made them clearly superior (metrics, error classes, language/domain)? Any concrete thresholds (docs/day churn, avg doc length, query mix) where vectors start paying off?

3. Anyone running pure API search + LLM query expansion (multi-query, aggregation, re-rank) at scale? How many queries per task? Latency/cost vs. vector search?

4. Hybrid setups that worked: e.g., API search to narrow → vector re-rank; or vector recall → LLM judge → final set. What cut false positives/negatives the most?

5. Multilingual/Japanese/domain jargon: where do embeddings still fail you? Did re-ranking (LLM or classic) fix it?

6. Freshness strategies without vectors: caching, recency boosts, metadata filters? What actually reduced “stale answer” complaints?

7. For MCP-style connectors (Notion/Slack/Drive): do you rely on vendor search, or do you replicate content and index yourself? Why?

8. If you’d start from scratch today for a 10-person team, what baseline would you ship first?

Why I’m asking: Our goal is throughput (less time hunting, more time shipping). I’m leaning to:

* Phase 1: MCP/API search + LLM query expansion (3–5 queries), union top-N, local re-rank; no vectors. * Phase 2 (only if needed): add a vector index for the failure cases we can’t fix with expansion/re-rank.

Happy to share a summary of takeaways after the thread. Thanks!

Comments (0)

No comments yet