Always get the best LLM performance for your $?

3 romain_batlle 0 6/23/2025, 3:34:03 PM

Hey, I built an inference router that literally makes provider of LLM compete in real-time on speed, latency, price to serve each call. So it works on open and closed model, and for closed model price is fixed so provider only “compete” on speed and latency.

Spent quite some time normalizing APIs, handling tool-calls, and managing prompt caching, but the end result sounds very cool: You always get the absolute best value for your \$ at the exact moment of inference.

Currently runs perfectly on a Roo and Cline fork, and on any OpenAI compatible BYOK app (so kind of everywhere)

Feedback very much welcomed! Please tear it apart: [https://makehub.ai](https://makehub.ai/)

Ask HN: How to get rid of Gemini?

Tell HN: Beware confidentiality agreements that act as lifetime non competes

Ask HN: How to regain the ability to read with focus and learn

Ask HN: Anyone using OpenAI's Agent SDK in production?

Ask HN: Could a "social mode" in AI chats replace social media?

Are we overfitting our code to trends instead of problems?

Ask HN: Using AI daily but not seeing productivity gains – is it just me?

Ask HN: How much would you pay to solve the largest problem you face?

I built an app to backup Live Photos from iPhone to external hard drives

Ask HN: Do you think switching between apps hurts your productivity?

Handling safety/compliance in edtech apps?

Ask HN: Are you hesitant to open source your project because LLMs may steal it?

Free Virtual CS Classes and Tutoring

Ask HN: Hydrogen plasma to deoxidize Aluminum for sustainable green hydrogen?

Ask HN: What newspaper are you paying for these days?

Ask HN: Tips for hiring? It has been difficult

Ask HN: Tech people who are self employed. How do you do it?

Ask HN: Is AI 'context switching' exhausting?

BMW ConnectedDrive lets me control my returned rental car (Sixt)

People with Diabetes Are Cured in Small Trial of New Drug

Ask HN: What is your recommendation for a wireless keyboard and mouse?

Ask HN: What do you think about app native vs. portable look-and-feel?

Ask HN: Data engineers, What suck when working on exploratory data-related task?

Tell HN: Sam and Jony Announcement 404s

Is there a way to run an LLM as a better local search engine?

Ask HN: How did you meet your co-founder?

Ask HN: AI agents and the future of UI/UX design. Opinions?

Ask HN: At what point did your startup hire its first lawyer?

Ask HN: What is the equivalent to Win32 on Linux

Always get the best LLM performance for your $?

Comments (0)