Ask HN: MCP/API search vs. vector search – what's winning for you?
1. Embedding ops cost (re-indexing, freshness) is high.
2. LLMs are getting good at iterative query expansion over plain search APIs (BM25-style).
3. Embedding quality is still uneven across domains/languages. Curious what you are actually seeing in production.
Context: We’re a \~10-person team inside a large company. People use different UIs (ChatGPT, Claude, Dify, etc.). Cost/security aren’t our main issues; we just want higher throughput. We can wire MCP-style connectors (Notion/Slack/Drive) or run our own vector index—trying to pick battles that really move the needle.
Hypotheses I’m testing:
* For fast-changing corp knowledge, BM25 + LLM query expansion + light re-ranking beats maintaining a vector store (lower ops, decent recall).
* MCP/API search gives “good enough” docs if you union a few expanded queries and re-rank.
* Vectors still win for long-tail semantic matches and noisy phrasing—but only when content is relatively stable or you can afford frequent re-embeds.
What I want from HN (war stories, not vendor pitches):
1. Have you sunset or avoided vector DBs because ops/freshness pain outweighed gains? What were the data size, update rate, and latency targets?
2. If you kept vectors, what made them clearly superior (metrics, error classes, language/domain)? Any concrete thresholds (docs/day churn, avg doc length, query mix) where vectors start paying off?
3. Anyone running pure API search + LLM query expansion (multi-query, aggregation, re-rank) at scale? How many queries per task? Latency/cost vs. vector search?
4. Hybrid setups that worked: e.g., API search to narrow → vector re-rank; or vector recall → LLM judge → final set. What cut false positives/negatives the most?
5. Multilingual/Japanese/domain jargon: where do embeddings still fail you? Did re-ranking (LLM or classic) fix it?
6. Freshness strategies without vectors: caching, recency boosts, metadata filters? What actually reduced “stale answer” complaints?
7. For MCP-style connectors (Notion/Slack/Drive): do you rely on vendor search, or do you replicate content and index yourself? Why?
8. If you’d start from scratch today for a 10-person team, what baseline would you ship first?
Why I’m asking: Our goal is throughput (less time hunting, more time shipping). I’m leaning to:
* Phase 1: MCP/API search + LLM query expansion (3–5 queries), union top-N, local re-rank; no vectors. * Phase 2 (only if needed): add a vector index for the failure cases we can’t fix with expansion/re-rank.
Happy to share a summary of takeaways after the thread. Thanks!
No comments yet