Show HN: 50+ LLMs on 2 GPUs with 2-Second Swapping? We built AI-Native Runtime

3 pveldandi 0 5/16/2025, 4:16:27 PM github.com ↗

We've built InferX, a specialized runtime environment that fundamentally changes how LLMs are served. The core problem we solve is the latency bottleneck in AI inference, especially with large models. Current systems waste resources or suffer from painfully slow cold starts.

InferX's AI-native architecture, with its "snapshot" technology, enables:

* *Sub-2s cold starts:* Spin up models instantly. * *High density:* Serve more LLMs on the same GPUs. * *Optimal efficiency:* Maximize GPU utilization.

This isn't just another API; it's a new execution layer designed from the ground up for the unique demands of LLM inference. We're seeing strong interest from infrastructure teams and AI platform builders.

Would love your thoughts and feedback! What are the biggest challenges you're facing with LLM deployment?

Demo: https://inferx.net/

Show HN: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java (github.com)

Show HN: Merliot – plugging physical devices into LLMs (github.com)

Show HN: Visual flow-based programming for Erlang, inspired by Node-RED (github.com)

Show HN: KVSplit – Run 2-3x longer contexts on Apple Silicon (github.com)

Show HN: Solidis – Tiny TS Redis client, no deps, for serverless (github.com)

Show HN: SQL-tString a t-string SQL builder in Python (github.com)

Show HN: Workflow Use – Deterministic, self-healing browser automation (RPA 2.0) (github.com)

Show HN: Rv, a Package Manager for R (github.com)

SHOW HN: How I built my first SaaS as a 13 year old to solve invoice management (paynic.vercel.app)

Show HN: Self-Funded Game with Homemade Engine – Play Online, Steam Coming (bereprobate.com)

Show HN: Real-Time Gaussian Splatting (github.com)

Show HN: Muscle-Mem, a behavior cache for AI agents (github.com)

Show HN: Easel – Code multiplayer games like singleplayer (easel.games)

Show HN: I’ve built an IoT device to let my family know when I’m in a meeting (nullonerror.org)

Show HN: Undetectag, track stolen items with AirTag (undetectag.com)

Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust) (github.com)

Show HN: Min.js style compression of tech docs for LLM context (github.com)

Show HN: A free AI risk assessment tool for LLM applications (gettavo.com)

Show HN: Airweave – Let agents search any app (github.com)

Show HN: Making #regions actually useful in VSCode (github.com)

Show HN: Samurai Interview – a mock interview simulator (samuraiinterview.com)

Show HN: Lumier – Run macOS VMs in a Docker (github.com)

Show HN: Lailaims – Talk to multiple LLMs at once (lailaims.pages.dev)

Show HN: Semantic Calculator (king-man+woman=?) (calc.datova.ai)

Show HN: I vibe coded an open-source Go app to back up DBs using Docker labels (github.com)

Show HN: CSV GB+ by Data.olllo – Open and Process CSVs Locally (apps.microsoft.com)

Show HN: An extensible personal assistant server (github.com)

Show HN: Heygem AI – An Open Source, Free Alternative to Heygen AI (github.com)

Show HN: An MCP that fetches code context from all your repos (github.com)

Show HN: Build your own AI code reviewer

Show HN: I made a platform to debug Puppeteer (JS) crashes visually (buglesstack.com)

Show HN: I made a boring game to get you less bored while waiting AI thinking (waitwonder.com)

Show HN: wghttp – An HTTP server for managing WireGuard devices (github.com)

Show HN: 50+ LLMs on 2 GPUs with 2-Second Swapping? We built AI-Native Runtime (github.com)

Show HN: Inconveniently operating my computer with voice and hand gestures (twitter.com)

Show HN: Convert JSON Schema to SQL DDL (github.com)

Show HN: Kudos.wiki – Discover the best movies on Wikipedia (kudos.wiki)

Show HN: A5 (github.com)

Show HN: Basecoat – shadcn/UI components, no React required

Show HN: Lumoar – Free SOC 2 tool for SaaS startups (lumoar.com)

Show HN: I built a way to live with a dumbphone (lightfriend.ai)

Show HN: CLI that spots fake GitHub stars, risky dependencies and licence traps (github.com)

Show HN: Note API Connector – Import and Sync Any API Data into Notion (noteapiconnector.com)

Show HN: Mnemonic Finder – Extension to find mnemonic meanings by right-clicking (chromewebstore.google.com)

Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback

Show HN: Online Compass (onlinecompass.in)

Show HN: Pixelagent – Build your Stateful Agent Framework in 200 lines of code (github.com)

Show HN: acmsg (automated commit message generator) (github.com)

Show HN: Doxxer – CLI tool for dynamic SemVer versioning using tags (github.com)

Show HN: OmniWeb (github.com)

Show HN: 50+ LLMs on 2 GPUs with 2-Second Swapping? We built AI-Native Runtime

Comments (0)