We reached some amount of early product-market fit and the “starter” stack that served us well early on: Vercel functions, Supabase Postgres, and a mix of queue services, started hitting limits. Function runtimes capped at 15 min, cron jobs had to be public, queues kept sun-setting, and the database had no automatic fail-over. We were spending more hours nursing infra than building features.
Over the next six months we rebuilt on plain AWS: EKS for the Next.js app, Aurora Postgres in-VPC, SQS for jobs, all managed with Terraform. A Cloudflare Worker mirrored an increasing slice of live traffic to Kubernetes until everything looked solid, then a DNS flip made it the source of truth.
Supabase data flowed into Aurora via logical replication; once lag was near zero we swapped the connection string and retired the old stack without users noticing. Costs are predictable now and we control most knobs.
We put this post together as a bit of a retro and to help if you're evaluating a similar path for your company.
Over the next six months we rebuilt on plain AWS: EKS for the Next.js app, Aurora Postgres in-VPC, SQS for jobs, all managed with Terraform. A Cloudflare Worker mirrored an increasing slice of live traffic to Kubernetes until everything looked solid, then a DNS flip made it the source of truth.
Supabase data flowed into Aurora via logical replication; once lag was near zero we swapped the connection string and retired the old stack without users noticing. Costs are predictable now and we control most knobs.
We put this post together as a bit of a retro and to help if you're evaluating a similar path for your company.