Show HN: Serverless platform for running voice AI agents

Over the past year, we've collaborated with hundreds of developers building voice AI agents using our Agents SDK. It quickly became apparent that hosting these persistent, session-based workloads is a world apart from traditional HTTP servers. We fielded the same questions repeatedly:

- How do I size CPU/memory for unpredictable session lengths?

- What's the best way to autoscale without overprovisioning?

- How can I monitor and optimize performance across concurrent sessions?

A few months ago, we set out to build a serverless platform tailored for this—think Vercel for stateful AI agents, with low-latency cold starts and seamless scaling. Along the way, we tackled some fun engineering challenges:

- Container isolation to sandbox workloads and prevent noisy-neighbor issues

- Minimizing container startup times, to ensure proper autoscaling

- Custom load balancing to distribute sessions based on real-time resource utilization, not just round-robin—since session durations vary wildly

- Graceful draining during updates or scaling events

We've been dogfooding this internally and with early users, and it's handling thousands of concurrent voice sessions with minimal latency spikes. If you're building AI agents (voice or otherwise) and wrestling with infra, we'd love your feedback—does this solve pain points you've hit? What's missing?

Show HN: Serverless platform for running voice AI agents

Comments (0)