Show HN: Vulners Lookup – CVE highlights on any page (hover for CVSS/EPSS/PoCs) (chromewebstore.google.com)

GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

FINDarkside · 12m ago

It's trivial to get around the same score than GPT-4 with 1% of the cost by using my propertiary routing algorithm that routes all requests to Gemini 2.5 Flash.

Keyframe · 33m ago

number of complaints / million tokens?

pqtyw · 17m ago

> GPT-4 at $24.7 per million tokens

While technically true why would you want to use it when OpenAI itself provides a bunch of many times cheaper and better models?

QuadmasterXLII · 38m ago

The framing in the headline is interesting. As far as I recall, spending 4x more compute on a model to improve performance by 7% is the move that has worked over and over again up to this point. 101 % of GPT-4 performance (potentially at any cost) is what I would expect an improved routing algorithm to achieve.

dang · 30s ago

(The submitted title was "93% of GPT-4 performance at 1/4 cost: LLM routing with weak bandit feedback")

spoaceman7777 · 22m ago

Incredible that they are using contextual bandits, and named it: Preference-prior Informed Linucb fOr adaptive rouTing (PILOT)

Rather than the much more obvious: Preference-prior Informed Linucb For Adaptive Routing (PILFAR)

fny · 52m ago

Is there a reason human preference data is even needed? Don't LLMs already have a strong enough notion of question complexity to build a dataset for routing?

delichon · 44m ago

> a strong enough notion of question complexity

Aka Wisdom. No, LLMs don't have that. Me neither, I usually have to step in the rabbit holes in order to detect them.

jibal · 21m ago

LLMs don't have notions ... they are pattern matchers against a vast database of human text.

mhh__ · 9m ago

Please do a SELECT * from this database

andrewflnr · 49m ago

Is this really the frontier of LLM research? I guess we really aren't getting AGI any time soon, then. It makes me a little less worried about the future, honestly.

kenjackson · 42m ago

First, I don't think we will ever get to AGI. Not because we won't see huge advances still, but AGI is a moving ambiguous target that we won't get consensus on.

But why does this paper impact your thinking on it? It is about budget and recognizing that different LLMs have different cost structures. It's not really an attempt to improve LLM performance measured absolutely.

yahoozoo · 1m ago

That and LLMs are seemingly plateauing. Earlier this year, it seemed like the big companies were releasing noticeable improvements every other week. People would joke a few weeks is “an eternity” in AI…so what time span are we looking at now?

jibal · 20m ago

LLMs are not on the road to AGI, but there are plenty of dangers associated with them nonetheless.

srekhi · 43m ago

I'm not following this either. You'd think this would be frontier back in 2023

guluarte · 35m ago

I'm starting to think that there will not be an 'AGI' moment, we will simply slowly build smarter machines over time until we realize there is 'AGI'. It would be like video calls in the '90s everybody wanted them, now everybody hates them, lmao.

Show HN: woomarks, transfer your Pocket links to this app or self-host it (woomarks.com)

Show HN: Simple modenized .NET NuGet server reached RC (github.com)

Show HN: Public chat rooms with ephemeral chat and anonymous signup (phispr.space)

Show HN: Alpha- The fine structure constant emerged and code (zenodo.org)

Show HN: qdb.us is back, after extensive downtime (qdb.us)

Show HN: E-Paper Family 2 Day Calendar (github.com)

Show HN: Spotilyrics – See synchronized Spotify lyrics inside VS Code (github.com)

Show HN: Blueprint: Fast, Nunjucks-like templating engine for Java 8 and beyond

Show HN: First Half of "Swimming in Tech Debt" (book about tech debt) (helpthisbook.com)

Show HN: Sarpro – Get Sentinel‑1 GRD from NASA into → GeoTIFF/JPEG Fast (Rust) (github.com)

Show HN: The ASCII Side of the Moon (aleyan.com)

Show HN: Path-String Types for Go (pkg.go.dev)

Show HN: HTML Capture Compare – Chrome Extension to Debug Flaky Tests

Show HN: An ncurses CUDA-based fluid simulation (github.com)

Show HN: Anonymous Age Verification (gist.github.com)

Show HN: Hacker News em dash user leaderboard pre-ChatGPT (gally.net)

Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown (sosumi.ai)

Show HN: I made an Animal Crossing style letter editor (acmail.idreesinc.com)

Show HN: Vulners Lookup – CVE highlights on any page (hover for CVSS/EPSS/PoCs) (chromewebstore.google.com)

Show HN: Find Hidden Gems on HN (pj4533.com)

Show HN: Meetup.com and eventribe alternative to small groups (github.com)

Show HN: Auto-Match – How We Built Receipt-to-Transaction Matching (Open Source) (midday.ai)

Show HN: Oaki–job finder and resume maker (oaki.io)

Show HN: SwiftAI – open-source library to easily build LLM features on iOS/macOS (github.com)

Show HN: ThisTouristDoesNotExist (virtual tourists at real landmarks) (thistouristdoesnotexist.com)

Show HN: I integrated my from-scratch TCP/IP stack into the xv6-riscv OS (github.com)

Show HN: How to create and use Tesseract OCR in Rust programming language? (aiviewz.com)

Show HN: Pitaya – Orchestrate AI coding agents like Claude Code (github.com)

Show HN: Pol/ite – /pol/ but posts are all polite (pol-ite.web.app)

Show HN: A zoomable, searchable archive of BYTE magazine (byte.tsundoku.io)

Show HN: Turn Markdown into React/Svelte/Vue UI at runtime, zero build step (markdown-ui.com)

Show HN: I made a game called "Funeral of Freiren." (github.com)

Show HN: Octarine – a fast, lightweight, opinionated Markdown notes app (octarine.app)

Show HN: Base, an SQLite database editor for macOS (menial.co.uk)

Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated tool (github.com)

Show HN: FilterQL – A tiny query language for filtering structured data (github.com)

Show HN: A WebPDF Reader with AI assistance, for research and studying (pdf-hub.com)

Show HN: Yoink AI – macOS AI app that edits directly in any textfield of any app (useyoink.ai)

Show HN: Magic links – Get video and dev logs without installing anything

Show HN: Sourcerer – MCP for semantic code search that reduces token waste (github.com)

Show HN: Realistic Character with Mood AI (dmwithme.com)

Show HN: OpenAnimation – KMP app for exploring and editing Lottie animations (github.com)

Show HN: A01 – personal news agent to follow anything (apps.apple.com)

Show HN: Sometimes GitHub is boring, so I made a CLI tool to fix it (github.com)

Show HN: PageIndex – Vectorless RAG (github.com)

Show HN: Yet another daily word game – wotd (wotd.is)

Show HN: I Built a XSLT Blog Framework (vgr.land)

Show HN: Grammit – Local-only AI grammar checker (Chrome extension) (chromewebstore.google.com)

Show HN: Game to play and learn music visually (mm.subpixlabs.com)

Show HN: I made a mini site to see timezone shifts (tz.pert.dev)

Adaptive LLM routing under budget constraints

Comments (17)