Java Build Tools Could Be Better Swiss Java User Group Online Meetup 27 Aug 2025 [video] (youtube.com)

Hey HN! We've run our privacy-focused open-source inference company for a while now, and we're launching a flat monthly subscription similar to Anthropic's. It should work with Cline, Roo, KiloCode, Aider, etc — any OpenAI-compatible API client should do. The rate limits at every tier are higher than the Claude rate limits, so even if you prefer using Claude it can be a helpful backup for when you're rate limited, for a pretty low price. Let me know if you have any feedback!

Comments (8)

rationably · 38m ago

Do you plan to offer a high-quality FIM models in the bundle? Would be handy to perform autocompletion locally, say via the Qwen3-coder.

reissbaker · 15m ago

Interesting! Very open to the idea. What open-source fill-in-the-middle models are good right now? I've stayed on top of the open source primary coding LLMs, but haven't been following along for the open-source FIM ones.

logicprog · 4h ago

I was literally just wishing there was something like this, this is perfect! Do you do prompt caching?

reissbaker · 4h ago

Aw thanks! We don't currently, but from a cost perspective as a user it shouldn't matter much since it's all bundled into the same subscription (we rate-limit by requests, not by tokens — our request rate limits are set to "higher than the amount of messages per hour that Claude Code promises", haha). We might at some point just to save GPUs though!

logicprog · 2h ago

Yeah I wasn't worried so much about costs to me, as sustainability of your own prices — don't want to run into a "we're lowering quotas" situation like CC did :P

reissbaker · 2h ago

Lol fair! I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5, which fits on a ~relatively small node compared to the rumored sizes of Anthropic's models. We can throw a lot of tokens at it before running into issues — it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.

logicprog · 1h ago

> I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5,

That's funny, that's also my favorite coding model as well!

> the rumored sizes of Anthropic's models

Yeah. I've long had a hypothesis that their models are, like, average sized for a SOTA model, but fully dense, like that old llama 3.1 405b model, and that's why their per token inference costs are insane compared to the competition.

> it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.

That makes sense.

I'm poor as dirt, and my job actually forbids AI code in the main codebase, so I can't justify even a $20 per month prescription right now (especially when, for experimenting with agentic coding, qwen code is currently free (if shitty)) but when or if it becomes financially responsible, you will be at the very top of my list.

reissbaker · 51m ago

<3 thank you!

Wplace Picture Outline Maker (wplacehelper.net)

Handling 500M clicks with a $4 VPS [video] (youtube.com)

Why energy shocks permanently lower fertility rates and total maternal rates (governance.fyi)

Show HN: NinjaTech AI (VM-Based Coding Agent, SOTA Accuracy, 5x Faster) (super.myninja.ai)

Ask HN: Job market for Staff/Director tech roles barren?

The GitOps Non-Kubernetes Homelab (shadybraden.com)

Java Build Tools Could Be Better Swiss Java User Group Online Meetup 27 Aug 2025 [video] (youtube.com)

Drugs, smuggling and abductions: inside the world of pigeon racing in Taiwan (theguardian.com)

Notes on Programming in C by Rob Pike (lysator.liu.se)

Lower Immigration Projections Mean Lower Breakeven Employment Growth Estimates (stlouisfed.org)

Lightning Is Misunderstood (bitcoinmagazine.com)

Consumer Hardware Development Literature (twitter.com)

OpenPandora (openpandora.org)

Show HN: Try out all the best image generation models in one platform for free (imageninja.ai)

Stability of Underground Granite Chamber After Penetration and Explosion (mdpi.com)

Russian space official: "Stop lying to ourselves" about our industry's health (arstechnica.com)

Clojure 1.12.2 (clojure.org)

Hardware Design: Anechoic Chamber Guide for EMC and RF (Wireless) Testing (2016) (emcfastpass.com)

The Sudden Surges That Forge Evolutionary Trees (quantamagazine.org)

OpenAI Realtime API connected to a Map (It is good at geography) (twitter.com)

Microsoft's Copilot AI is now inside Samsung TVs and monitors (theverge.com)

When Your Cache Has a Bigger Carbon Footprint Than Your Users (robbyonrails.com)

Why Lenders Are Building AI Agents Now [video] (youtube.com)

Yes, I *Would* Sacrifice Myself For 10^100 Shrimp (kylestar.net)

Heist – Viral by Design (krishinasnani.substack.com)

Show HN: CleanerAudio – AI to clean and transcribe audio

Claude Sonnet Will Ship in Xcode (developer.apple.com)

Show HN: A natural language search interface to ETF (and deep insights on each) (signalbloom.ai)

Brickyard (justlaybrick.com)

Ask HN: What to do when you suspect your interview is with a state operative?

Gravity Defied mobile game rewrite from Java to C++ & SDL (github.com)

FBI cyber cop: Salt Typhoon pwned 'nearly every American' (theregister.com)

Project MiniNAS (jadarma.github.io)

AMD MI300X for LLM Serving Disaggregating Prefill and Decode with SGLang (rocm.blogs.amd.com)

I researched every attempt to stop fascism in history. The success rate is 0% (cmarmitage.substack.com)

AWS X-Ray SDK and daemon end of support timeline (docs.aws.amazon.com)

AUR Repository Still Under DDoS Attack (linux-magazine.com)

From Airbnb to America's 'Chief Design Officer' (nytimes.com)

How AI stem separation alters the workflow of music producers (medium.com)

Military spending and war (nber.org)

Ford and the Birth of the Model T (construction-physics.com)

Increased autonomic activation in vicarious embarrassment (2012) [pdf] (pubmed.ncbi.nlm.nih.gov)

Two Chinese Nationals Arrested for Allegedly Illegal Shipping AI Chips to China (justice.gov)

'Universal' Cancer Vaccine Destroys Resistant Tumors in Mice (sciencealert.com)

BCHS Stack: BSD, C, httpd, SQLite (learnbchs.org)

Ask HN: How to teach a 4 year old to code?

F1 in Hungary: Strategy and fast tire changes make all the difference (arstechnica.com)

The Rise of Computer Use and Agentic Coworkers (a16z.com)

We Oops-Proofed Infrastructure Deletion on Railway (blog.railway.com)

From the 'Banter Bill' to Bias Hotlines: The Alarming Rise of Snitch Networks (thedailyeconomy.org)

Show HN: A private, flat monthly subscription for open-source LLMs

Comments (8)

Yes, I Would Sacrifice Myself For 10^100 Shrimp (kylestar.net)