Internal tool reduced our LLM (ChatGPT/Claude) costs by 70% for document Q&A
1 sam-abdul 3 8/11/2025, 10:13:36 PM
Our startup was spending a significant amount each month using ChatGPT and Claude to analyze documents like reports and contracts. The problem was that every query sent the entire document to the LLM, even when only specific sections were relevant.
To address this, we built an internal tool that:
Stores documents once and breaks them into chunks
Uses embeddings to find only the relevant sections per question
Sends just those parts to ChatGPT, Claude, or Gemini
The result has been roughly a 70% reduction in LLM usage costs, with no drop in answer quality.
We’re considering whether to release this tool publicly and would be interested to hear if others face similar challenges or have built similar solutions.
Curious what approaches people are using to optimize document-based LLM workflows?
Comments (3)
ungreased0675 · 4h ago
Combine it with a self-hosted local LLM and I think you’d have something.
xenospn · 4h ago
Gemini has context caching enabled by default. You shouldn’t be paying for these tokens anyway, even if you send them to Google multiple times.
sam-abdul · 4h ago
What about when using Gemini API? This works well when using Gemini chatbox interface and still I believe there is a limit to the number of documents you can add.