Show HN: InferX - AI Lambda-Like Inference Function as a Service

2 pveldandi 0 5/15/2025, 2:15:59 PM

Cold starts are a 10x tax on every LLM query. Spinning up containers + loading 100GB+ models in 2025 is a UI crime.

InferX is a ground-up rewrite. We snapshot the entire GPU state (weights, KV cache, CUDA context) and restore it on-demand in <2s. This isn't incremental; it's a leap.

The result? Insane speedups (up to 10x) and 90%+ GPU utilization. We can even hot-swap models mid-flight, treating them like threads.

Anyone still doing inference the old way, seriously?

Tech deep dive & benchmarks: https://github.com/inferx-net/inferx

(We're also open-sourcing parts of this soon.)

DeSantis signs Florida fluoride ban. What comes next? (tampabay.com)

EU CRA Guidelines for "Important" and "Critical" Products with Digital Elements (certitude.consulting)

Why has American pop culture stagnated? (noahpinion.blog)

Female Korean Haenyeo divers show genetic adaptations to cold water (npr.org)

Internet Phone Book (store.are.na)

Grok, if IP laws were eliminated would you be better or worse at your mission? (twitter.com)

Farmers fear dingoes are eating livestock but poo tells an unexpected story (theconversation.com)

Mimic-IV: A freely accessible electronic health record dataset (nature.com)

Pull request bots as code sidekicks (jazzberry.ai)

DNA seen through the eyes of a coder (2021) (berthub.eu)

Modular’s bet to break out of the Matrix (Democratizing AI Compute, Part 10) (modular.com)

One Company Poisoned the Planet [video] (youtube.com)

Children under six should avoid screen time, French medical experts say (theguardian.com)

The Intuit App Partner Program with Program Fees (blogs.intuit.com)

Pyrefly – A fast type checker and IDE for Python (github.com)

Charles Butler's the Feminine Monarchie, or the History of Bees (1634 Edition) (publicdomainreview.org)

MIT engineering students crack egg dilemma, finding sideways is stronger (news.mit.edu)

JetBrains Opens the IntelliJ Idea 2025.2 Early Access Program to Developers (neowin.net)

United Healthcare Response to Wall Street Journal Article (unitedhealthgroup.com)

Don't Buy Stuff from Old AI People [video] (youtube.com)

Richard L. Garwin, a Creator of the Hydrogen Bomb, Dies at 97 (nytimes.com)

You might live to be 100. Are you ready? (theguardian.com)

Anthropic's lawyer forced to apologize after Claude hallucinated legal citation (techcrunch.com)

The largest and most detailed photo ever taken of a work of art (rijksmuseum.nl)

Ramp AI Index (ramp.com)

JetBrains Research (jetbrains.com)

AI's Zero Day Vulnerability (medium.com)

Breachforums Boss to Pay $700k in Healthcare Breach (krebsonsecurity.com)

Retailers urge European Commission to crack down on Visa, Mastercard (reuters.com)

NATS 2.11 Consumer Pausing (qaze.app)

Tek – a music making program for 24-bit Unicode terminals (codeberg.org)

Computational Public Space (dynamicland.org)

Topic 40: What Is GRPO and Flow-GRPO? (turingpost.com)

Don't believe the 'biological intelligence' hype (2022) (catgirl.ai)

Highlights from oral arguments in case over Trump's birthright citizenship order (apnews.com)

New nanoparticle could make cancer treatment safer, more effective (sciencedaily.com)

Show HN: Discuss design decisions for your app with AI with codebase context (useprd.com)

Show HN: Runik – generate e-reader dictionaries for fictional worlds (github.com)

Alzheimer's Update: Infectious Agents and Gamma-Secretase Too (science.org)

Open Letter to Brian Chesky (docs.google.com)

Old and Small Technology (complete.org)

Salter's Screwdriver Theory of Latency (jrs-s.net)

Roast – Structured AI workflows made easy (Ruby) (github.com)

What Were the Painted Targets on Old Warplanes For? (slashgear.com)

Just launched PropCopy.ai: generate SEO real estate listings from photos (propcopy.ai)

Cerebras CEO finds common ground with Nvidia as startup notches IBM win (theregister.com)

Lego Developer/GitHub Octocat (2021) (drive.google.com)

Why Headless AI Agents in CI Aren't Working (and What to Do About It) (qckfx.com)

Google restores Nextcloud users file access on Android (arstechnica.com)

Fine-tune STT models for edge devices (hyperplane.gumroad.com)

Show HN: InferX - AI Lambda-Like Inference Function as a Service

Comments (0)