Show HN: A private, flat monthly subscription for open-source LLMs

12 reissbaker 4 8/28/2025, 7:03:24 PM synthetic.new ↗
Hey HN! We've run our privacy-focused open-source inference company for a while now, and we're launching a flat monthly subscription similar to Anthropic's. It should work with Cline, Roo, KiloCode, Aider, etc — any OpenAI-compatible API client should do. The rate limits at every tier are higher than the Claude rate limits, so even if you prefer using Claude it can be a helpful backup for when you're rate limited, for a pretty low price. Let me know if you have any feedback!

Comments (4)

logicprog · 2h ago
I was literally just wishing there was something like this, this is perfect! Do you do prompt caching?
reissbaker · 2h ago
Aw thanks! We don't currently, but from a cost perspective as a user it shouldn't matter much since it's all bundled into the same subscription (we rate-limit by requests, not by tokens — our request rate limits are set to "higher than the amount of messages per hour that Claude Code promises", haha). We might at some point just to save GPUs though!
logicprog · 42m ago
Yeah I wasn't worried so much about costs to me, as sustainability of your own prices — don't want to run into a "we're lowering quotas" situation like CC did :P
reissbaker · 14m ago
Lol fair! I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5, which fits on a ~relatively small node compared to the rumored sizes of Anthropic's models. We can throw a lot of tokens at it before running into issues — it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.