How We Took Vapi from 99.9% to 99.99% Reliability

Comments (1)

jordandearsley · 18h ago

A while ago, Vapi.ai was at ~99.9% uptime which was around 8 hours of downtime a year.

We set a goal for 99.99% (<1 hour/year), and quickly learned that getting there meant 100 small changes, not one big one.

Some highlights from what we did to achieve this goal:

- When our primary database on Neon falters, traffic now shifts to Aurora in under five seconds, keeping calls alive.

- Every external dependency has a backup. LLM calls roll from OpenAI → Azure → Bedrock

- Deployments are rolled out gradually across clusters by an automated canary manager, which starts at 5% of traffic and rolls back instantly if error rates rise.

- When traffic spikes, Lambda burst workers come online in milliseconds and tunnel into our Kubernetes cluster over QUIC, handling overflow without dropping calls.

In total, these changes cut dropped calls by 97% and made provider outages invisible to users.

Full deep dive with architecture diagrams, failure scenarios, and code-level decisions here: https://vapi.ai/blog/how-we-achieved-99-99-reliability-at-va...

GPT-5 (openai.com)

Fight Chat Control (fightchatcontrol.eu)

GitHub is no longer independent at Microsoft after CEO resignation (theverge.com)

I tried every todo app and ended up with a .txt file (al3rez.com)

Claude Sonnet 4 now supports 1M tokens of context (anthropic.com)

Ultrathin business card runs a fluid simulation (github.com)

I want everything local – Building my offline AI workspace (instavm.io)

Wikipedia loses challenge against Online Safety Act (bbc.com)

Emailing a one-time code is worse than passwords (blog.danielh.cc)

Debian 13 “Trixie” (debian.org)

Vibechart (vibechart.net)

Claude Code is all you need (dwyer.co.za)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

How I code with AI on a budget/free (wuu73.org)

Try and (ygdp.yale.edu)

GPT-5: Key characteristics, pricing and system card (simonwillison.net)

Wikimedia Foundation Challenges UK Online Safety Act Regulations (wikimediafoundation.org)

OpenFreeMap survived 100k requests per second (blog.hyperknot.com)

Show HN: Building a web search engine from scratch with 3B neural embeddings (blog.wilsonl.in)

Jim Lovell, Apollo 13 commander, has died (nasa.gov)

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Historical Tech Tree (historicaltechtree.com)

FFmpeg 8.0 adds Whisper support (code.ffmpeg.org)

Cursed Knowledge (immich.app)

The Chrome VRP Panel has decided to award $250k for this report (issues.chromium.org)

Meta Leaks Part 1: Israel and Meta (archive.org)

Search all text in New York City (alltext.nyc)

Why are there so many rationalist cults? (asteriskmag.com)

Monero appears to be in the midst of a successful 51% attack (twitter.com)

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (magazine.sebastianraschka.com)

Flipper Zero dark web firmware bypasses rolling code security (rtl-sdr.com)

Getting good results from Claude Code (dzombak.com)

The Framework Desktop is a beast (world.hey.com)

StarDict sends X11 clipboard to remote servers (lwn.net)

GPT-5 for Developers (openai.com)

Linear sent me down a local-first rabbit hole (bytemash.net)

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

OpenSSH Post-Quantum Cryptography (openssh.com)

Trump Orders National Guard to Washington and Takeover of Capital’s Police (nytimes.com)

Vanishing from Hyundai’s data network (techno-fandom.org)

My Lethal Trifecta talk at the Bay Area AI Security Meetup (simonwillison.net)

The surprise deprecation of GPT-4o for ChatGPT consumers (simonwillison.net)

Claude says “You're absolutely right!” about everything (github.com)

Windows XP Professional (win32.run)

MCP overlooks hard-won lessons from distributed systems (julsimon.medium.com)

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models [pdf] (arxiv.org)

Tor: How a military project became a lifeline for privacy (thereader.mitpress.mit.edu)

OpenAI's new open-source model is basically Phi-5 (seangoedecke.com)

Exit Tax: Leave Germany before your business gets big (eidel.io)

Building Bluesky comments for my blog (natalie.sh)

How We Took Vapi from 99.9% to 99.99% Reliability

Comments (1)