Internal tool reduced our LLM (ChatGPT/Claude) costs by 70% for document Q&A

Our startup was spending a significant amount each month using ChatGPT and Claude to analyze documents like reports and contracts. The problem was that every query sent the entire document to the LLM, even when only specific sections were relevant.

To address this, we built an internal tool that:

Stores documents once and breaks them into chunks

Uses embeddings to find only the relevant sections per question

Sends just those parts to ChatGPT, Claude, or Gemini

The result has been roughly a 70% reduction in LLM usage costs, with no drop in answer quality.

We’re considering whether to release this tool publicly and would be interested to hear if others face similar challenges or have built similar solutions.

Curious what approaches people are using to optimize document-based LLM workflows?

Internal tool reduced our LLM (ChatGPT/Claude) costs by 70% for document Q&A

Comments (3)