Ask HN: MCP/API search vs. vector search

Ask HN: MCP/API search vs. vector search – what's winning for you?

1 ngkw 0 8/20/2025, 1:04:40 AM

TL;DR: I have a hunch that demand for classic RAG (embeddings + vector DB) will shrink. Reasons:

1. Embedding ops cost (re-indexing, freshness) is high.

2. LLMs are getting good at iterative query expansion over plain search APIs (BM25-style).

3. Embedding quality is still uneven across domains/languages. Curious what you are actually seeing in production.

Context: We’re a \~10-person team inside a large company. People use different UIs (ChatGPT, Claude, Dify, etc.). Cost/security aren’t our main issues; we just want higher throughput. We can wire MCP-style connectors (Notion/Slack/Drive) or run our own vector index—trying to pick battles that really move the needle.

Hypotheses I’m testing:

* For fast-changing corp knowledge, BM25 + LLM query expansion + light re-ranking beats maintaining a vector store (lower ops, decent recall).

* MCP/API search gives “good enough” docs if you union a few expanded queries and re-rank.

* Vectors still win for long-tail semantic matches and noisy phrasing—but only when content is relatively stable or you can afford frequent re-embeds.

What I want from HN (war stories, not vendor pitches):

1. Have you sunset or avoided vector DBs because ops/freshness pain outweighed gains? What were the data size, update rate, and latency targets?

2. If you kept vectors, what made them clearly superior (metrics, error classes, language/domain)? Any concrete thresholds (docs/day churn, avg doc length, query mix) where vectors start paying off?

3. Anyone running pure API search + LLM query expansion (multi-query, aggregation, re-rank) at scale? How many queries per task? Latency/cost vs. vector search?

4. Hybrid setups that worked: e.g., API search to narrow → vector re-rank; or vector recall → LLM judge → final set. What cut false positives/negatives the most?

5. Multilingual/Japanese/domain jargon: where do embeddings still fail you? Did re-ranking (LLM or classic) fix it?

6. Freshness strategies without vectors: caching, recency boosts, metadata filters? What actually reduced “stale answer” complaints?

7. For MCP-style connectors (Notion/Slack/Drive): do you rely on vendor search, or do you replicate content and index yourself? Why?

8. If you’d start from scratch today for a 10-person team, what baseline would you ship first?

Why I’m asking: Our goal is throughput (less time hunting, more time shipping). I’m leaning to:

* Phase 1: MCP/API search + LLM query expansion (3–5 queries), union top-N, local re-rank; no vectors. * Phase 2 (only if needed): add a vector index for the failure cases we can’t fix with expansion/re-rank.

Happy to share a summary of takeaways after the thread. Thanks!

UiuaPy: Manipulate NumPy arrays in Python using Uiua (github.com)

My development team costs $41.73 a month (philipotoole.com)

Show HN: Truth Wave – Community Driven Truth or Myth Game (truth-wave.lovable.app)

Show HN: Web Components SSR and hydration in 1KB– just a decorator, no framework (github.com)

Eye movement patterns reveal subtle signs of cognitive and memory decline (medicalxpress.com)

Show HN: Hedge UI – React starter kit for trading applications (hedgeui.com)

We’re Not So Special: A new book challenges human exceptionalism (democracyjournal.org)

Help Seed the Smithsonian Archive (neuromatch.social)

Plus, Minus: A Gentle Introduction to the Physics of Orthogonal (gregegan.net)

Online M3U8 video downloader – no more manual fragment merging (m3u8downloader.org)

LLMs Are Letter-Blind and Here's Why Enterprises Should Care (viveksgag.substack.com)

Microsoft workers occupy HQ in protest against company ties to Israeli military (theguardian.com)

Is radicalization reinforced by social media censorship? (2021) (arxiv.org)

Show HN: Textideo – Generate short AI videos from text prompts (textideo.com)

Canva Valuation Increases to 42B (afr.com)

Show HN: PineBill – make invoices in the browser (free, no ads, no account) (pinebill.com)

The Long Season of Langdev (blog.fogus.me)

Doom on the Anker Prime Charging Station (mastodon.social)

Author Rie Qudan: Why I used ChatGPT to write my prize-winning novel (theguardian.com)

Show HN: Qwen Image Edit– Intelligent Image Editing with Qwen-Image-Edit Vision (qwenimageedit.cc)

Multimodal Sensing-Enabled LLMs for Automated Emotional Regulation (mdpi.com)

AlexNet (en.wikipedia.org)

The Moving of Kiruna Church Livestream (lkab.com)

Landrecords – cheap nationwide parcel dataset standardized using gemma3 (landrecords.us)

The Fallacies of Management – The Network Is Reliable (xangelo.medium.com)

The Value of Hitting the HN Front Page (mooreds.com)

Image Prompt (imageprompt.site)

The Biden Administration's Gamble to Freeze China's AI Future (wired.com)

Self-Driving Postgres (postgres.fm)

Skill issues – Dialectical Behavior Therapy and its discontents (2024) (thedriftmag.com)

'American' (kieranhealy.org)

Trump is demanding a Panama-China break-up (politico.com)

China's youth unemployment hits 11-month high as graduates joins job hunt (scmp.com)

Why Can't the U.S. Build 5-Minute E.V. Chargers? (nytimes.com)

Quantifying Baseball Pitch Tunneling with K-Nearest Neighbors (runningonnumbers.com)

Senior, 76, died while trying to meet Meta AI chatbot which he thought was real (nypost.com)

Newly Discovered Origami Patterns Put the Bloom on the Fold (nytimes.com)

Ask HN: Why haven't we seen AI enhanced AirPods microphone yet?

Media Transformations from Cloudflare Stream · Changelog (developers.cloudflare.com)

Show HN: Kuse 2.0 – AI Visual Folder: Chaos In, Genius Out (app.kuse.ai)

Apple Acquires Styra (Creators of Open Policy Agent) (blog.openpolicyagent.org)

The Future of JavaScript: What Awaits Us (jsdev.space)

Computing Machinery and Intelligence (1950) (academic.oup.com)

Why is it so hard for startups to compete with Cadence? (zach.be)

Jellyfin on macOS for a quick self-hosted media library (jeffgeerling.com)

x86 Emulator from Scratch, in Scratch (reddit.com)

Human braincell computer launched commercially (newatlas.com)

Next.js 15.5 (nextjs.org)

Ask HN: How to prepare for potential layoffs in this AI era?

Unboxing Discourse 3.5 (blog.discourse.org)

Ask HN: MCP/API search vs. vector search – what's winning for you?

Comments (0)