Show HN: 1.5B LLM routing model that aligns to preferences, not leaderboards

3 honorable_coder 0 7/17/2025, 8:29:12 PM huggingface.co ↗

Hi HN — we're the team behind Arch (an open-source edge and service proxy for agents)[1], and today we're releasing Arch-Router (https://huggingface.co/katanemo/Arch-Router-1.5B), a 1.5B LLM router model designed to align to user-defined preferences, not public benchmarks and leader boards.

As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps:

- Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers.

- Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often can't capture what matters in production: domain-specific quality or subjective evaluation criteria. These routers are often opaque, difficult to debug, and their quality judgments can feel arbitrary, failing to capture the subjective nuance of what a “good” response actually means for a specific user’s intent.

Arch-Router takes a different approach: route to LLMs based on preferences written as policies in plain ol English.

You write policies like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and the full conversation context) to those policies using a lightweight 1.5B auto-regressive model. The model is capable to handle intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. To read more about the strength of our model, check out our research paper here: https://arxiv.org/abs/2506.16655

Essentially, Arch-Router splits the routing process into two distinct parts:

    Route Selection: This is the what. The system defines a set of human-readable routing policies using a “Domain-Action Taxonomy.” Think of it as a clear API contract written in plain English. A policy isn’t just intent_123; it’s a descriptive label like Domain: ‘finance’, Action: ‘analyze earnings report’. The router’s only job is to match the user’s query to the best-fit policy description.

    Model Assignment: This is the how. A separate, simple mapping configuration connects each policy to a specific LLM. The finance/"analyze earnings report" policy might map to a powerful model like GPT-4o, while a simpler general/"greeting" policy maps to a faster, cheaper model.

Specs:

- 1.5B params — runs on a single GPU (or CPU for testing)

- No retraining needed — point it at any mix of LLMs

- Outperforms larger closed models on our conversational routing benchmarks (details in the paper)

Links:

[1] Arch Proxy: https://github.com/katanemo/archgw

Rejoy Health (YC W21) Is Hiring (ycombinator.com)

Weave (YC W25) is hiring an AI engineer (ycombinator.com)

CoinTracker (YC W18) is hiring to solve crypto taxes and accounting (remote)

Crimson (YC X25) is hiring founding engineers in London (ycombinator.com)

Martin (YC S23) Is Hiring Founding Engineers to Build a Better Siri (ycombinator.com)

Meticulous (YC S21) is hiring in UK to redefine software dev (tinyurl.com)

Infisical (YC W23) Is Hiring DevRel Engineers (ycombinator.com)

Sieve (YC X25) is hiring researchers to build large video datasets for AI labs (sievedata.com)

Activeloop (YC S18) Is Hiring AI Search and Python Back End Engineers(Onsite,MV) (careers.activeloop.ai)

Attimet (YC F24) – Quant Trading Research Lab – Is Hiring Founding Researcher (ycombinator.com)

Metriport (YC S22) is hiring engineers to improve healthcare data exchange (ycombinator.com)

Telli (YC F24) Is Hiring Engineers [On-Site Berlin] (hi.telli.com)

Continue (YC S23) is hiring software engineers in San Francisco (ycombinator.com)

UpCodes (YC S17) is hiring a Head of Ops to automate construction compliance (up.codes)

Enhanced Radar (YC W25) is hiring a founding engineer

Converge (YC S23) well-capitalized New York startup seeks product developers (runconverge.com)

Kyber (YC W23) Is Hiring Enterprise BDRs (ycombinator.com)

MindsDB (YC W20) is hiring an AI solutions engineer (job-boards.greenhouse.io)

Recurse Center (YC S10) Is Hiring a Career Facilitator (recurse.notion.site)

Cua (YC X25) is hiring an engineer (ycombinator.com)

Noloco (YC S21) is hiring a founder's associate in Barcelona (ycombinator.com)

14.ai (YC W24) hiring founding engineers in SF to build a Zendesk alternative (14.ai)

Lago (Open-Source Usage Based Billing) is hiring for ten roles (ycombinator.com)

Spark AI (YC W24) is hiring a full-stack engineer in SF (founding team) (ycombinator.com)

Bitmovin (YC S15) Is Hiring a Junior Solutions Engineer in Denver (bitmovin.com)

SigNoz (YC W21, Open Source Datadog) Is Hiring DevRel Engineers (Remote)(US) (ycombinator.com)

AccessOwl (YC S22) is hiring an Elixir Engineer to connect 100s of SaaS (ycombinator.com)

FurtherAI (YC W24) Is Hiring for Software and AI Roles (ycombinator.com)

Yarn (YC W24) is hiring engineers in NYC (ycombinator.com)

Expand.ai (YC S24) is hiring a founding engineer

Optifye.ai (YC W25) is hiring a back end engineer

Show HN: 1.5B LLM routing model that aligns to preferences, not leaderboards

Comments (0)