Reduced OpenAI RAG costs by 70% by using a pre-check API call
2 Kong91 2 6/1/2025, 12:08:07 AM
I am using OpenAI's RAG implementation for my product. I tried doing it on my own with Pinecone but could never get it to retrieve relevant info. Anyway, OpenAI is costly, they charge for embeddings and using "file search" which retrieves the relevant chunk after the question is embedded and turned into vectors for similarity search. Not all questions a user asks need to retrieve context (which is costly). SO, I included a pre-step that users a cheaper OpenAI model to determine whether the question asked needs the context or not, if not, the RAG implementation is not touched. This decreased costs by 70%, making the business viable or more lucrative.
Comments (2)
kristianp · 1d ago
Sounds interesting, but how accurate is it? Have you done evals?
Kong91 · 1d ago
It's pretty accurate, it cites the caselaw it used to answer so you can check that it exists and did not hallucinate or cite US law etc.