Hey folks , We’ve been building InferX. an AI-native runtime that snapshots the full GPU execution state of LLMs (weights, KV cache, CUDA context) and restores it in under 2s. This lets us hot-swap models like threads. no reloading, no cold starts.

We treat each model as a lightweight, resumable process. like an OS for LLM inference.

Why it matters:

•Run 50+ LLMs per GPU (7B–13B range)

•90% GPU utilization (vs ~30–40% with conventional setups)

•Avoids cold starts by snapshotting and restoring directly on GPU •Designed for agentic workflows, toolchains, and multi-tenant use cases

•Helpful for Codex CLI-style orchestration or bursty multi-model apps

Still early, but we’re seeing strong interest from builders and infra folks. Would love thoughts, feedback, or edge cases you’d want to see tested.

Demo: https://inferx.net X: @InferXai

Comments (2)

sauravt · 12d ago

Very interesting. How would memory (or previous chat context awareness) work in the case of hot swapping, when multiple users to hot swapping models like threads.

precompute · 12d ago

Wow, that's really cool!

Ask HN: What are you working on? (April 2025)

Ask HN: What benchmarks are you using to judge AI models?

Ask HN: What Makes a Good Datasheet?

Ask HS: Career Advice for Someone Struggling

Ask HN: I don't want to work in software anymore. Where do I go?

New Way to Fund Solo-Founders?

Ask HN: When will Wayland eclipse X?

Ask HN: Alternatives to Deploys per Million Metric?

Payment processors shouldn't be able to charge through an expired card

Ask HN: Are there any apps to track grocery prices in local stores?

I want to help clear EU skies from US clouds

Kronotop: Horizontally scalable, distributed, transactional document database

Ask HN: Memory-safe low level languages?

Ask HN: Share your AI prompt that stumps every model

Ask HN: What are your favourite daily puzzle games?

Ask HN: CS degrees, do they matter again?

Last month 10k apps were built on our platform – here's what we learned

Ask HN: Do you know niche job board for jobs that are NOT remote?

Ask HN: How to fix issue and find the origin of bug in codebase?

Ask HN: What Is the Hacker News for Medicine?

Ask HN: Resources for LeetCode Grinding?

Ask HN: Is anyone else finding it impossible to use SMS in new apps?

Ask HN: Sold my company, parents passed away – feeling lost. What now?

Ask HN: Can vibe coding competitions be challenging and fair?

Ask HN: Is there a list of projects that will *not* adopt AI?

Ask HN: Known ways app telemetry is abused?

Ask HN: How do you get into systems programming

Ask HN: How does ChatGPT Shopping work?

Has anyone else found Google's AI overview to be oddly error prone?

Ask HN: Did someone dig into the JFK files?

What are the key bottlenecks for AI Agent development in the next 1–2 years?

Sephera – A fast, YAML-configurable LOC analyzer that supports any language

Ask HN: My CEO wants to go hard on AI. What do I do?

Ask HN: Why does OpenAI require an ID to use their image API?

Ask HN: I am at a loss. What shall I do?

Show HN: InferX – an AI-native OS for running 50 LLMs per GPU with hot swapping

Comments (2)

Ask HN: Is there a list of projects that will not adopt AI?