Reduced OpenAI RAG costs by 70% by using a pre-check API call

2 Kong91 2 6/1/2025, 12:08:07 AM

I am using OpenAI's RAG implementation for my product. I tried doing it on my own with Pinecone but could never get it to retrieve relevant info. Anyway, OpenAI is costly, they charge for embeddings and using "file search" which retrieves the relevant chunk after the question is embedded and turned into vectors for similarity search. Not all questions a user asks need to retrieve context (which is costly). SO, I included a pre-step that users a cheaper OpenAI model to determine whether the question asked needs the context or not, if not, the RAG implementation is not touched. This decreased costs by 70%, making the business viable or more lucrative.

Comments (2)

kristianp · 1d ago

Sounds interesting, but how accurate is it? Have you done evals?

Kong91 · 1d ago

It's pretty accurate, it cites the caselaw it used to answer so you can check that it exists and did not hallucinate or cite US law etc.

Red Cross says at least 21 killed and dozens shot in Gaza aid incident (bbc.com)

Rolldown-Vite: a Rust-Rewrite of Rollup (voidzero.dev)

Is "The Phoenician Scheme" Wes Anderson's Most Emotional Film? (newyorker.com)

The Steve Ballmer Interview: The Complete History and Strategy (acquired.fm)

How to post when no one is reading (jeetmehta.com)

Show HN: MBCompass - Android Compass App (github.com)

Price Index Could Clarify Opaque GPU Costs for AI (spectrum.ieee.org)

Projected Outcomes of Removing Fluoride from US Public Water Systems (jamanetwork.com)

MailLM (maillm.com)

LFSR CPU Running Forth (github.com)

INTERCAL Rides Again – Restoring a Lost Compiler (adventofcomputing.libsyn.com)

Autonomous Software Maintenance Has Arrived (tembo.io)

The Relation of Mathematics and Physics (1964) (feynmanlectures.caltech.edu)

Turning used cooking oil into soap where deep-Fried foods rule (bbc.com)

Ask HN: How do you find ideas?

Disaster of a product – so many things wrong [video] (youtube.com)

Inventing Japanese Braille (historyworkshop.org.uk)

2024 Pay for S&P 500 CEOs (wsj.com)

Show HN: A small library for stack-trace-like error messages in Rust (docs.rs)

Does U.S. Need to Build Hardened Aircraft Shelters for Combat Aircraft? (2024) (twz.com)

Show HN: I built an AI Agent that uses the iPhone (github.com)

Automatic rollbacks are a last resort (octopus.com)

How Can AI Researchers Save Energy? By Going Backward (quantamagazine.org)

Bugs Love Starlink [video] (reddit.com)

Building a Newsroom Technology Culture (werd.io)

Transitive Closure in PostgreSQL (engineering.remind.com)

Show HN: LMStudio Client in Elixir (github.com)

What megalodon ate to meet its 100k-calorie daily requirement (cnn.com)

Silicon Valley wants to help me make a superbaby (sfstandard.com)

Uploading the Human Mind Could One Day Become a Reality, Predicts Neuroscientist (sciencealert.com)

The Princeton INTERCAL Compiler's source code (esoteric.codes)

Exponential Functions and Euler's Formula (deaneyang.com)

I Miss My Fan Regulator (rishikeshs.com)

Not Everything Is on the Internet (2024) (bruh.ltd)

The Heat Mirage: My least-favorite internet maneuver (dynomight.net)

Measles vaccines save lives each year (ourworldindata.org)

Claude Code: An Analysis (southbridge-research.notion.site)

Harvard Has Trained So Many Chinese Officials, They Call It Their 'Party School' (wsj.com)

My clerk landing page template, for free (twitter.com)

Show HN: Agno – A full-stack framework for building Multi-Agent Systems (github.com)

New sonar tool is a 'game changer' for mapping the sea floor (science.org)

TPDE: A Fast Adaptable Compiler Back-End Framework (arxiv.org)

Schulte Grid Training – Django app for attention/reaction speed training (schultetable.net)

Future of Professionals Report (2024) [pdf] (thomsonreuters.com)

Generative AI will probably make blogs better (pcloadletter.dev)

Structured Exercise After Adjuvant Chemotherapy for Colon Cancer (nejm.org)

Sylvain Chomet Won't Be Using AI Anytime Soon (hollywoodreporter.com)

How to name binary (inorganic) compounds given their chemical formula? (2018) (chemistry.stackexchange.com)

Lessons From Cursor's System Prompt (byteatatime.dev)

Ask HN: Best GTM strategies to engage indie hackers as early users?

Reduced OpenAI RAG costs by 70% by using a pre-check API call

Comments (2)