Ligeti – Musica ricercata No.7 – Cantabile – ARR. for theremin and analog synths [video] (youtube.com)

Hi HN — we're the team behind Arch (https://github.com/katanemo/archgw), an open-source proxy for LLMs written in Rust. Today we're releasing Arch-Router (https://huggingface.co/katanemo/Arch-Router-1.5B), a 1.5B router model for preference-based routing, now integrated into the proxy. As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps:

- Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers.

- Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often miss what matters in production: domain-specific quality or subjective preferences like “Will legal accept this clause?”

Arch-Router takes a different approach: route by preferences written in plain language. You write rules like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and conversation context) to those rules using a lightweight 1.5B autoregressive model. No retraining, no fragile if/else chains. We built this with input from teams at Twilio and Atlassian. It handles intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. Full details are in our paper (https://arxiv.org/abs/2506.16655), but here's a snapshot:

Specs:

- 1.5B params — runs on a single GPU (or CPU for testing)

- No retraining needed — point it at any mix of LLMs

- Cost and latency aware — route heavy tasks to expensive models, light tasks to faster/cheaper ones

- Outperforms larger closed models on our conversational routing benchmarks (details in the paper)

Links:

- Arch Proxy (open source): https://github.com/katanemo/archgw

- Model + code: https://huggingface.co/katanemo/Arch-Router-1.5B

- Paper: https://arxiv.org/abs/2506.16655

Comments (8)

jgant13 · 27m ago

Solid. Can you show us when to use this vs. say OpenRouter? The performance seems strong for sure. TIA.

sparacha · 4h ago

Hi HN! I am one of the co-authors of the paper. If there are any questions about our approach, I would love to answer them.

jedisct1 · 58m ago

I tried to use it to rate the difficulty level of coding tasks (for InferSwitch, an LLM router), but it performed far worse than Qwen2.5-Coder-7B (but sure, 1.5B vs 7B)

sparacha · 43m ago

Can you share more about your evaluation setup? I would love to see the specific usage pattern as we have tested our model against smaller LLMs and foundational models and our results show things differently. Of course, routing policies should follow best practices here: https://docs.archgw.com/guides/llm_router.html

Nonetheless, super curious to learn more and see what we may be able to improve. This is technically not a classifier model - its a usage prediction model (feels like a classifier, but not quite in terms of intended usage)

cotran2 · 32m ago

According to the post, the model is fine-tuned for routing to different tasks/domains. Classifying difficulty level is probably not the intended use case.

tmaly · 2h ago

do you think it would be possible to quantize this model and still get good results?

sparacha · 2h ago

yes - we have already published a quantized version here: https://huggingface.co/katanemo/Arch-Router-1.5B.gguf. The performance difference with a quant version is negligible. I'll run another analysis and update the thread shortly

sparacha · 32m ago

Overall performance degrades from 93.17 -> 92.99 with a quantized version

Neuromancer is in production (bsky.app)

Amp: A text editor for your terminal (github.com)

Ligeti – Musica ricercata No.7 – Cantabile – ARR. for theremin and analog synths [video] (youtube.com)

Scaleway NL-AMS down for 8 hours because of high temps (status.scaleway.com)

Study Reveals That Internet Searches Can Hinder Creativity (cmu.edu)

Specter of dams and diversion looms over Southeast Asia's Salween River (news.mongabay.com)

What is automatable and who is replaceable? Thoughts from my morning commute (togelius.blogspot.com)

Recursive factorial in 14 characters (2023) (mvanier.github.io)

V-JEPA 2: Self-Supervised Video Models Enable Understanding,Prediction,Planning (github.com)

Show HN: CareerCupid now (OkCupid for Jobs) now supports job listings

The Technical Feasibility of Divesting Google Chrome (kgi.georgetown.edu)

Blocking Sudo Exploits with Fapolicyd (jwgarber.ca)

Yoneda-Epistemology (networkspirits.com)

Stream per Agent Session (s2.dev)

An analytic theory of creativity in convolutional diffusion models (arxiv.org)

Introduction to Linux Netkit interfaces – with a grain of eBPF (blog.yadutaf.fr)

Laptop Mag is shutting down (theverge.com)

Show HN: Terminal in Browser (github.com)

Websites to get 'game-changing' AI bot blocker (bbc.com)

Using OCR to Fix a Hilarious Bug (2015) (artsy.github.io)

The Recursive Enumerability of Fixed-Point Combinators [pdf] (brics.dk)

Make the Most of Mentorship (dontbreakprod.com)

Show HN: Nextbike.lol – Track any rental bike across Europe (nextbike.lol)

Building a Personal AI Factory (john-rush.com)

Fix the New Things First (antithesis.com)

Ask HN: Anyone is an "AI Engineer"? What does your job tasks include?

I Cycled 2500km in London – Here's What I Learned [video] (youtube.com)

A Trio of US Treasury Hacks Exposes a Pattern Making Banks Nervous (bloomberg.com)

The sheer unlikeliness of CB Fry (thedabbler.co.uk)

Santander to buy UK high street lender TSB for £2.65B (ft.com)

Lucidly collaborative website QA (lucidly.so)

Happy Frog: The Complete Guide to HID Chaos (zerodumb.dev)

K-Bot open source robot available for preorder (twitter.com)

Google makes it easier to let friends and kids control your smart home (theverge.com)

Meet Visual AI's Unlikely Winner (crazystupidtech.com)

Firefox introduces semantic history search (connect.mozilla.org)

Effectiveness of trees in reducing temperature & outdoor heat exposure in Vegas (iopscience.iop.org)

This puzzle game shows kids how they’re smarter than AI (washington.edu)

Now available: Claude Code sessions in Depot (depot.dev)

The Cost of Learning (juffalow.com)

IU Bloomington to Eliminate or Suspend over 100 Academic Programs (bloomingtonian.com)

OpenMW FAQ (openmw.org)

Batteries and Buildings (mtende.blog)

Bloofi: Multidimensional Bloom Filters (2016) (arxiv.org)

The Scam of Age Verification (pornbiz.com)

Ask HN: Do you create data visualizations often?

Content Independence Day: no AI crawl without compensation! (blog.cloudflare.com)

Ever Had Problems Rounding Off Figures? This Stock Exchange Has [pdf] (www5.in.tum.de)

A universal interface connecting you to premier AI models (tenzorro.com)

Show HN: Monorail – Turn CSS animations into interactive SVG graphs (muffinman.io)

Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks

Comments (8)