Show HN: InferX - AI Lambda-Like Inference Function as a Service

2 pveldandi 0 5/15/2025, 2:15:59 PM
Cold starts are a 10x tax on every LLM query. Spinning up containers + loading 100GB+ models in 2025 is a UI crime.

InferX is a ground-up rewrite. We snapshot the entire GPU state (weights, KV cache, CUDA context) and restore it on-demand in <2s. This isn't incremental; it's a leap.

The result? Insane speedups (up to 10x) and 90%+ GPU utilization. We can even hot-swap models mid-flight, treating them like threads.

Anyone still doing inference the old way, seriously?

Tech deep dive & benchmarks: https://github.com/inferx-net/inferx

(We're also open-sourcing parts of this soon.)

Comments (0)

No comments yet