The Space Shuttle Columbia disaster and the over-reliance on PowerPoint (2019) (mcdreeamiemusings.com)

Hey HN! We've run our privacy-focused open-source inference company for a while now, and we're launching a flat monthly subscription similar to Anthropic's. It should work with Cline, Roo, KiloCode, Aider, etc — any OpenAI-compatible API client should do. The rate limits at every tier are higher than the Claude rate limits, so even if you prefer using Claude it can be a helpful backup for when you're rate limited, for a pretty low price. Let me know if you have any feedback!

Comments (10)

rationably · 2h ago

Do you plan to offer a high-quality FIM models in the bundle? Would be handy to perform autocompletion locally, say via the Qwen3-coder.

reissbaker · 2h ago

Interesting! Very open to the idea. What open-source fill-in-the-middle models are good right now? I've stayed on top of the open source primary coding LLMs, but haven't been following along for the open-source FIM ones.

ykjs · 1h ago

Can this be provided as an API?

reissbaker · 1h ago

Yes! We have a standard OpenAI-compatible API, and we don't restrict subscriptions from using it (unlike Anthropic, where API keys are billed differently unless you're using Claude Code directly, or in a tool that wraps Claude Code).

logicprog · 6h ago

I was literally just wishing there was something like this, this is perfect! Do you do prompt caching?

reissbaker · 6h ago

Aw thanks! We don't currently, but from a cost perspective as a user it shouldn't matter much since it's all bundled into the same subscription (we rate-limit by requests, not by tokens — our request rate limits are set to "higher than the amount of messages per hour that Claude Code promises", haha). We might at some point just to save GPUs though!

logicprog · 4h ago

Yeah I wasn't worried so much about costs to me, as sustainability of your own prices — don't want to run into a "we're lowering quotas" situation like CC did :P

reissbaker · 4h ago

Lol fair! I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5, which fits on a ~relatively small node compared to the rumored sizes of Anthropic's models. We can throw a lot of tokens at it before running into issues — it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.

logicprog · 3h ago

> I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5,

That's funny, that's also my favorite coding model as well!

> the rumored sizes of Anthropic's models

Yeah. I've long had a hypothesis that their models are, like, average sized for a SOTA model, but fully dense, like that old llama 3.1 405b model, and that's why their per token inference costs are insane compared to the competition.

> it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.

That makes sense.

I'm poor as dirt, and my job actually forbids AI code in the main codebase, so I can't justify even a $20 per month prescription right now (especially when, for experimenting with agentic coding, qwen code is currently free (if shitty)) but when or if it becomes financially responsible, you will be at the very top of my list.

reissbaker · 3h ago

<3 thank you!

Claude Sonnet Will Ship in Xcode (developer.apple.com)

Python: The Documentary (lwn.net)

Ask HN: The government of my country blocked VPN access. What should I use?

Some thoughts on LLMs and software development (martinfowler.com)

Fuck up my site – Turn any website into beautiful chaos (fuckupmysite.com)

My startup banking story (2023) (mitchellh.com)

Uncertain<T> (nshipster.com)

AI adoption linked to 13% decline in jobs for young U.S. workers: study (cnbc.com)

Expert LSP the official language server implementation for Elixir (github.com)

Are OpenAI and Anthropic losing money on inference? (martinalderson.com)

Thrashing (exple.tive.org)

Building your own CLI coding agent with Pydantic-AI (martinfowler.com)

Sometimes CPU cores are odd (anubis.techaro.lol)

Launch HN: Dedalus Labs (YC S25) – Vercel for Agents

RSS is awesome (evanverma.com)

Ask HN: What to Learn for Math for Modeling?

TuneD is a system tuning service for Linux (tuned-project.org)

In Search of AI Psychosis (astralcodexten.com)

A Deep Dive into Debian 13 /tmp: What's New, and What to Do If You Don't Like It (lowendbox.com)

A forgotten medieval fruit with a vulgar name (2021) (bbc.com)

You no longer need JavaScript: an overview of what makes modern CSS so awesome (lyra.horse)

Rupert's Property (johncarlosbaez.wordpress.com)

Dependent Types: Universes, or types of types (jonmsterling.com)

There Goes the American Muscle Car (thedispatch.com)

Every industry is an overcrowded airport lounge now (quoththeraven.substack.com)

The King's Quarry: How Louis XVI Went from Hunter to Hunted (worldhistory.substack.com)

The Space Shuttle Columbia disaster and the over-reliance on PowerPoint (2019) (mcdreeamiemusings.com)

What We Find in the Sewers (asimov.press)

Will AI Replace Human Thinking? The Case for Writing and Coding Manually (ssp.sh)

Speed-coding for the 6502 – a simple example (colino.net)

Kagi Is Down (status.kagi.com)

VLT observations of interstellar comet 3I/ATLAS II (arxiv.org)

Show HN: SwiftAI – open-source library to easily build LLM features on iOS/macOS (github.com)

Microbial metabolite repairs liver injury by restoring hepatic lipid metabolism (journals.asm.org)

That boolean should probably be something else (ntietz.com)

RFC 8594: The Sunset HTTP Header Field (2019) (datatracker.ietf.org)

Prosper AI (YC S23) Is Hiring Founding Account Executives (NYC) (jobs.ashbyhq.com)

Charting Form Ds to roughly see the state of venture capital “fund” raising (tj401.com)

Optimising for maintainability – Gleam in production at Strand (gleam.run)

Open Source is one person (opensourcesecurity.io)

Group Borrowing: Zero-cost memory safety with fewer restrictions (verdagon.dev)

The sisters “paradox” – counter-intuitive probability (blog.engora.com)

Claude Code Checkpoints (claude-checkpoints.com)

GPU Prefix Sums: A nearly complete collection (github.com)

I've always wanted to be an open-source maintainer- now I regret it (joaomagfreitas.link)

Important machine learning equations (chizkidd.github.io)

Rendering a game in real time with AI (blog.jeffschomay.com)

Anything can be a message queue if you use it wrongly enough (2023) (xeiaso.net)

Performance Speed Limits (2019) (travisdowns.github.io)

A Troubled Man, His Chatbot and a Murder-Suicide in Old Greenwich (wsj.com)

Show HN: A private, flat monthly subscription for open-source LLMs

Comments (10)