Use One Big Server (2022)

106 antov825 92 8/31/2025, 5:29:07 PM specbranch.com ↗

Comments (92)

runako · 4h ago
One of the more detrimental aspects of the Cloud Tax is that it constrains the types of solutions engineers even consider.

Picking an arbitrary price point of $200/mo, you can get 4(!) vCPUs and 16GB of RAM at AWS. Architectures are different etc., but this is roughly a mid-spec dev laptop of 5 or so years ago.

At Hetzner, you can rent a machine with 48 cores and 128GB of RAM for the same money. It's hard to overstate how far apart these machines are in raw computational capacity.

There are approaches to problems that make sense with 10x the capacity that don't make sense on the much smaller node. Critically, those approaches can sometimes save engineering time that would otherwise go into building a more complex system to manage around artificial constraints.

Yes, there are other factors like durability etc. that need to be designed for. But going the other way, dedicated boxes can deliver more consistent performance without worries of noisy neighbors.

shrubble · 2h ago
It's more than that - it's all the latency that you can remove from the equation with your bare-metal server.

No network latency between nodes, less memory bandwidth latency/contention as there is in VMs, no caching architecture latency needed when you can just tell e.g. Postgres to use gigs of RAM and then let Linux's disk caching take care of the rest (and not need a separate caching architecture).

matt-p · 2h ago
The difference between a fairly expensive ($300) RDS instance + EC2 in the same region vs a $90 dedicated server with a NVME drive and postgres in a container is absolutely insane.
bspammer · 1h ago
A fair comparison would include the cost of the DBA who will be responsible for backups, updates, monitoring, security and access control. That’s what RDS is actually competing with.
yjftsjthsd-h · 12m ago
As long as you also include the Cloud Certified DevOps Engineer™[0] to set up that RDS instance.

[0] A normal sysadmin remains vaguely bemused at their job title and the way it changes every couple years.

matt-p · 1h ago
Totally. My frustration isn't even price though RDS is literally just dog slow.
shrubble · 57m ago
Paying someone $2000 to set that up once should result in the costs being recovered in what, 18 months?

If you’re running Postgres locally you can turn off the TCP/IP part; nothing more to audit there.

SSH based copying of backups to a remote server is simple.

If not accessible via network, you can stay on whatever version of Postgres you want.

I’ve heard these arguments since AWS launched, and all that time I’ve been running Postgres (since 2004 actually) and have never encountered all these phantom issues that are claimed as being expensive or extremely difficult.

Demiurge · 3h ago
I think it’s the other way around. I’m a huge fan of Hetzner for small sites with a few users. However, for bigger projects, the cloud seems to offer a complete lack of constraints. For projects that can pay for my time, $200/m or $2000/m in hosting costs is a negligible difference. What’s the development cost difference between AWS CDK / Terraform + GitHub Actions vs. Docker / K8s / Ansible + any CI pipeline? I don’t know; in my experience, I don’t see how “bare metal” saves much engineering time. I also don’t see anything complicated about using an IaC Fargate + RDS template.

Now, if you actually need to decouple your file storage and make it durable and scalable, or need to dynamically create subdomains, or any number of other things… The effort of learning and integrating different dedicated services at the infrastructure level to run all this seems much more constraining.

I’ve been doing this since before the “Cloud,” and in my view, if you have a project that makes money, cloud costs are a worthwhile investment that will be the last thing that constrains your project. If cloud costs feel too constraining for your project, then perhaps it’s more of a hobby than a business—at least in my experience.

Just thinking about maintaining multiple cluster filesystems and disk arrays—it’s just not what I would want to be doing with most companies’ resources or my time. Maybe it’s like the difference between folks who prefer Arch and setting up Emacs just right, versus those happy with a MacBook. If I felt like changing my kernel scheduler was a constraint, I might recommend Arch; but otherwise, I recommend a MacBook. :)

On the flip side, I’ve also tried to turn a startup idea into a profitable project with no budget, where raw throughput was integral to the idea. In that situation, a dedicated server was absolutely the right choice, saving us thousands of dollars. But the idea did not pan out. If we had gotten more traction, I suspect we would have just vertically scaled for a while. But it’s unusual.

runako · 2h ago
> I really don't see how "bare metal" saves any engineering time

This is because you are looking only at provisioning/deployment. And you are right -- node size does not impact DevOps all that much.

I am looking at the solution space available to the engineers who write the software that ultimately gets deployed on the nodes. And that solution space is different when the nodes have 10x the capability. Yes, cloud providers have tons of aggregate capability. But designing software to run on a fleet of small machines is very different from accomplishing the same tasks on a single large machine.

It would not be controversial to suggest that targeting code at an Apple Watch or Raspberry Pi imposes constraints on developers that do not exist when targeting desktops. I am saying the same dynamic now applies to targeting cloud providers.

This isn't to say there's a single best solution for everything. But there are tradeoffs that are now always apparent. The art is knowing when it makes sense to pay the Cloud Tax, and whether to go 100% Cloud vs some proportion of dedicated.

Demiurge · 2h ago
Overall, I agree that most people underestimate the runway that the modern dedicated server can give you.
andersmurphy · 3h ago
100% this add an embedded database like sqlite and optimise writes to batch and you can go really really far with hetzner. It's also why I find the "what about overprovisioning" argument silly (once you look outside of AWS you can get insane cost/perf ratio).

Also in my experience more complex systems tend to have much less reliability/resilience than simple single node systems. Things rarely fail in isolation.

themafia · 1h ago
On AWS if you want raw computational capacity you use Lambda and not EC2. EC2 is for legacy type workloads and doesn't have nearly the same scaling power and speed that Lambda does.

I have several workloads that just invoke Lambda in parallel. Now I effectively have a 1000 core machine and can blast through large workloads without even thinking about it. I have no VM to maintain or OS image to consider or worry about.

Which highlights the other difference that you failed to mention. Hertzner charges a "one time setup" fee to create that VM. That puts a lot of back pressure on infrastructure decisions and removes any scalability you could otherwise enjoy in the cloud.

If you want to just rent a server then Hertzner is great. If you actually want to run "in the cloud" then Hertzner is a non-starter.

solid_fuel · 51m ago
Strong disagree here. Lambda is significantly more expensive per vCPU hour and introduces tight restrictions on your workflow and architecture, one of the most significant being maximum runtime duration.

Lambda is a decent choice when you need fast, spiky scaling for a lot simple self-contained tasks. It is a bad choice for heavy tasks like transcoding long videos, training a model, data analysis, and other compute-heavy tasks.

matt-p · 57m ago
Very few providers charge setup, some will provision a server within a 90s of an api call.
dang · 3h ago
HN uses two—one live and one backup, so we can fail over if there's a hardware issue or we need to upgrade something.

It's a nice pattern. Just don't make them clones of each other, or they might go BLAM at the same time!

https://news.ycombinator.com/item?id=32049205

https://news.ycombinator.com/item?id=32032235

https://news.ycombinator.com/item?id=32028511 (<-- this is where it got figured out)

---

Edit: both these points are mentioned in the OP.

bpye · 1h ago
Whilst not as fatal as a failing SSD, AMD also had a fun errata where a CPU core would hang in CC6 after ~1044 days.

https://www.servethehome.com/amd-epyc-7002-rome-cpus-hang-af...

lewisjoe · 4h ago
I helped bootstrap a company that made an enterprise automation engine. The team wanted to make the service available as SaaS for boosting sales.

They could have got the job done by hosting the service in a vps with a multi-tenant database schema. Instead, they went about learning kubernetes and drillingg deep into "cloud-native" stack. Spent a year trying to setup the perfect devops pipeline.

Not surprisingly the company went out of business within the next few years.

joshmn · 3h ago
This is my experience too—there’s too much time wasted trying to solve a problem that might exist 5 years down the road. So many projects and early-stage companies would be just fine either with a PaaS or nginx in front of a docker container. You’ll know when you hit your pain point.
cpursley · 3h ago
Yep, this is why I'm a proponent of paas until the bill actually hurts. Just pay the heroku/render/fly tax and focus on product market fit. Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...
fragmede · 3h ago
Yeah, same. Vercel + Neon and then if you actually have customers and actually end up paying them enough money that it becomes significant, then you can refactor and move platforms, but until you do, there are bigger fish to fry.
matt-p · 2h ago
100%. Making it a docker container and deploying it is literally a few hours at most.
DaSHacka · 3h ago
> Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...

I mean, of the two, the PaaS route certainly burns more money, the exception being the rare shop that is so incompetent they can't even get their own infrastructure configured correctly, like in GP's situation.

There are guaranteed more shops that would be better off self-hosting and saving on their current massive cloud bills than the rare one-offs that actually save so much time using cloud services, it takes them from bankruptcy to being functional.

fragmede · 3h ago
> the PaaS route certainly burns more money,

Does it? Vercel is $20/month and Neon starts at $5/month. That obviously goes up as you scale up, but $25/month seems like a fairly cheap place to start to me.

(I don't work for Vercel or Neon, just a happy customer)

cpursley · 2h ago
Yeah, also a happy neon customer - but they can get pricy. Still prefer them over AWS. For compute, Fly is pretty competitive.
theaniketmaurya · 50m ago
I’m using Neon too and upgraded to the scale up version today. Curious, what do you mean rhat they can get pricey?
matt-p · 4h ago
A thoroughly good article. It's probably worth also considering adding a CDN if you take this approach at scale. You get to use their WAF and DNS failover.

A big pain point that I personally don't love is that this non-cloud approach normally means running my own database. It's worth considering a provider who also provides cloud databases.

If you go for an 'active/passive' setup, consider saving even more money by using a cloud VM with auto scaling for the 'passive' part.

In terms of pricing the deals available these days on servers are amazing you can get 4GB RAM VPSs with decent CPU and bandwidth for ~$6 or bare metal for ~$90 for 32GB RAM quad core worth using sites like serversearcher.com to compare.

railorsi · 2h ago
What’s the issue with running Postgres inside a docker container + regular backups? Never had problem and relatively easy to manage.
matt-p · 2h ago
no PITB, but mostly just 'it's hassle' for the application server I literally don't need backups, just automated provisioning/docker container etc. Adding postgres then means I need full backups including PITB because I don't even want to lose an hours data.
andersmurphy · 3h ago
If you're running on a single machine then you'll get way more performance with something like sqlite (instead of postgres/MySQL) which also makes managing the database quite trivial.
bob1029 · 4h ago
This isn't even the end game for "one big server". AMD will give the most bang per rack, but there are other factors.

An IBM z17 is effectively one big server too, but provides levels of reliability that are simply not available in most IT environments. It won't outperform the AMD rack, but it will definitely keep up for most practical workloads.

If you sit down and really think honestly about the cost of engineering your systems to an equivalent level of reliability, you may find the cost of the IBM stack to be competitive in a surprising number of cases.

PaulKeeble · 2h ago
I often wonder if my home NAS/Server would be better off put onto a rented box or a cloud server somewhere, especially since I now have 1gbit/s internet. Even now the 20TB of drive space and 6 Cores with 32GB on Hetzner with a dedicated is about twice the price of buying the hardware over a 5 year period. I suspect the hardware will actually last longer than that and its the same level of redundancy (RAID) on a rented dedicated so the backup is the same cost between the two.

Using cloud and box storage on Hetzner is more expensive than the dedicated server, 4x owning the hardware and paying the power bill. AWS and Azure are just nuts, >100x the price because they charge so much for storage even with hard drives. Contabo nor Netcup can do this, its too much storage for them.

Every time I look at this I come to the same basic conclusion, the overhead of renting someone else’s machine is quite high compared to the hardware and power cost and it would be a worse solution than having that performance on the local network for bandwidth and latency. The problem isn't so much the compute performance, that is relatively fairly priced, its the storage costs and data transfer that bites.

Not really what the article was necessarily about but cloud is sort of meant to be good for low end hardware but its actually kind of not, the storage costs are just too high even a Hetzner Storage box.

Nextgrid · 1h ago
It really depends on your power costs. In certain parts of Europe, power is so expensive that Hetzner actually works out cheaper (despite them providing you the entire machine and datacenter-grade internet connection).
bpye · 1h ago
I think I’ve settled on both being the answer - Hetzner is affordable enough that I can have a full backup of my NAS (using ZFS snapshots and incremental backups), and as a bonus can host some services there instead of at home. My home network still has much lower latency and so is preferable for ie. my Lightroom library.
dang · 2h ago
Related ongoing thread:

How many HTTP requests/second can a single machine handle? (2024) - https://news.ycombinator.com/item?id=45085446 - Aug 2025 (32 comments)

alkonaut · 3h ago
Microservices vs not is (almost) orthogonal to N servers vs one. You can make 10 microservices and rent a huge server and run all 10 services. It's more an organizational thing than a deployment thing. You can't do the opposite though, make a monolith and spread it out on 10 servers.
marcosdumay · 2h ago
> You can't do the opposite though, make a monolith and spread it out on 10 servers.

You absolutely can, and it has been the most common practice for scaling them for decades.

const_cast · 1h ago
> You can't do the opposite though, make a monolith and spread it out on 10 servers.

Yes you can. Its called having multiple applications servers. They all run the same application, just more of them. Maybe they connect to the same DB, maybe not, maybe you shard the DB.

Havoc · 1h ago
>Unfortunately, since all of your services run on servers (whether you like it or not), someone in that supply chain is charging you based on their peak load.

This seems fundamentally incorrect to me? If I need 100 units of peak compute during 8 hours of work hours, I get that from Big Cloud, and they have two other clients needing same in offset timezones then in theory the aggregate cost of that is 1/3rd of everyone buying their own peak needs.

Whether big cloud passes on that saving is another matter, but it's there.

i.e. big cloud throws enough small customers together so that they don't have "peak" per se just a pretty noisy average load that is in aggregate mostly stable

KronisLV · 3h ago
Just today I wasted some time due to an unexpected Tailscale key expiry and some other issues related to running a container cluster: https://blog.kronis.dev/blog/the-great-container-crashout

Right now, my plan is to move from a bunch of separate VPSes, to one dedicated server from Hetzner and run a few VMs inside of it with separate public IPs assigned to them alongside some resource limits. You can get them for pretty affordable prices, if you don't need the latest hardware: https://www.hetzner.com/sb/

That way I can limit the blast range if I mess things up inside of a VM, but at the same time benefit from an otherwise pretty simple setup for hosting personal stuff, a CPU with 8 threads and 64 GB of RAM ought to be enough for most stuff I might want to do.

decasia · 4h ago
Regardless of the cost and capacity analysis, it's just hard to fight the industry trends. The benefits of "just don't think about hardware" are real. I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front). And above all, if an AWS region goes down, it doesn't seem like your org's fault, but if your bespoke private hosting arrangement goes down, then that kinda does seem like your org's fault.
logifail · 4h ago
> and server hardware is expensive up front

You don't need to buy server hardware(!), the article specifically mentions renting from eg Hetzner.

> The benefits of "just don't think about hardware" are real

Can you explain on this claim, beyond what the article mentioned?

bearjaws · 4h ago
> Can you explain on this claim, beyond what the article mentioned?

I run a lambda behind a load balancer, hardware dies, its redundant, it gets replaced. I have a database server fail, while it re provisions it doesn't saturate read IO on the SAN causing noisy neighbor issues.

I don't deal with any of it, I don't deal with depreciation, I don't deal with data center maintenance.

Nextgrid · 1h ago
> I don't deal with depreciation, I don't deal with data center maintenance.

You don't deal with that either if you rent a dedicated server from a hosting provider. They handle the datacenter and maintenance for you for a flat monthly fee.

marcosdumay · 3h ago
> I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front).

Yes, there is.

Honestly, it looks to me that this school of thought is mostly adopted by people that can't do arithmetic or use a calculator. But it does absolutely exist.

That said, no, servers are not nearly expensive enough to move the needle on a company nowadays. The room that often goes around them is, and that's why way more people rent the room than the servers in it.

sam_lowry_ · 3h ago
Connectivity is a problem, not the room.

I ran the IT side of a media company once, and it all worked on a half-empty rack of hardware in a small closet... except for the servers that needed bandwidth. These were colocated. Until we realized that the hoster did not have enough bandwidth, at which point we migrated to two bare metal servers at Hetzner.

marcosdumay · 2h ago
It's connectivity, reliable power, reliable cooling, and security.

The actual space isn't a big deal, but the entire environment has large fixed costs.

matt-p · 3h ago
If you rent dedicated servers, then you're not worrying about any of the capex or maintenance stuff.
wongarsu · 4h ago
For anything up to about 128GB RAM you can still easily avoid capex by just renting servers. Above that it gets a bit trickier
matt-p · 4h ago
Renting (hosted) servers above 128GB RAM is still pretty easy, but I agree pricing levels out. 128GB RAM server ~$200/Month, 384 GB ~$580, 1024 GB ~$940/Month
IshKebab · 3h ago
It's not like it's a huge capex for that level of server anyway. Probably less than the cost of one employee's laptop.
qaq · 3h ago
the benefits of don't write a distributed system unless you really have to are also very real
decasia · 4h ago
To be clear - this isn't an endorsement on my part, just observations of why cloud-only deployment seems common. I guess we shouldn't neglect the pressure towards resume-oriented development either, as it undoubtedly plays a part in infra folks' careers. It probably makes you sound obsolete to be someone who works in a physical data center.

I for one really miss being able to go see the servers that my code runs on. I thought data centers were really interesting places. But I don't see a lot of effort to decide things based on pure dollar cost analysis at this point. There's a lot of other industry forces besides the microeconomics that predetermine people's hosting choices.

turtlebits · 4h ago
The problem is sizing and consistency. When you're small, it's not cost effective to overprovision 2-3 big servers (for HA).

And when you need to move fast (or things break), you can't wait a day for a dedicated server to come up, or worse, have your provider run out of capacity (or have to pick a different specced server)

IME, having to go multi cloud/provider is a way worse problem to have.

andersmurphy · 3h ago
Most industries are not bursty. Overprovision in not expensive for most businesses. You can handle 30000+ updates a second on a 15$ VPS.

A multi node system tends to be less reliable and more failure points than a single box system. Failures rarely happen in isolation.

You can do zero downtime deployment with a single machine if you need to.

matt-p · 4h ago
There are a number of providers who provision dedicated servers via API in minutes these days. Given a dedicated server starts at around $90/Month it probably does make sense for alot of people.
ChrisArchitect · 4h ago
dang · 3h ago
Thanks! Macroexpanded:

Use one big server - https://news.ycombinator.com/item?id=32319147 - Aug 2022 (585 comments)

joshmn · 4h ago
I did this (well, a large-r VPS for $120/month) for my Rails-based sports streaming website. I had a significant amount of throughput too, especially at peak (6-10pm ET).

My biggest takeaway was to have my core database tables (user, subscription, etc) backed up every 10 minutes, and the rest every hour, and test their restoration. (When I shut down the site it was 1.2TB.) Having a script to quickly provision a new node—in case I ever needed it—would have something up within 8 minutes from hitting enter.

When I compare this to the startups I’ve consulted for, who choose k8s because it’s what Google uses yet they only push out 1000s of database queries per day with a handful of background jobs and still try to optimize burn, I shake my head.

I’d do it again. Like many of us I don’t have the need for higher-complexity setups. When I did need to scale, I just added more vCPUs and RAM.

vevoe · 3h ago
Is there somewhere I can read more about your setup/experience with your streaming site? I currently run a (legal :) streaming site but have it hosted on AWS and have been exploring moving everything over to a big server. At this point it just seems like more work to move it than to just pay the cloud tax.
joshmn · 3h ago
Do a search for HeheStreams on your favorite search engine.

The technical bits aren’t all there, though, and there’s a plethora of noise and misinformation. Happy to talk via email though.

simonw · 4h ago
This was written in 2022, but looks like it's most still relevant today. Would be interesting to see updated numbers on the expected costs of various hosting providers.
randomtoast · 3h ago
Those servers are mainly designed for enterprise use cases. For hobby projects, I can understand why someone would choose Hetzner over AWS.

For enterprise environments, however, there is much more to consider. One of the biggest costs you face is your operations team. If you go with Hetzner, you essentially have to rebuild a wide range of infrastructure components yourself (WAF, globally distributed CDN, EFS, RDS, EKS, Transit Gateways, Direct Connect and more).

Of course, you can create your own solutions for all of these. At my company, a mid-size enterprise, we once tried to do exactly that.

WAF: https://github.com/TecharoHQ/anubis

CDN: Hetzner Nodes with Cache in Finnland, USA and GER

RDS: Self-hosted MySQL from Bitnami

EFS: https://github.com/rook/rook

EKS: https://github.com/vitobotta/hetzner-k3s

and 20+ more moving targets of infra software stack and support systems

The result was hiring more than 10 freelancers in addition to 5 of our DevOps engineers to build it all and handling the complexity of such a setup and the keep everything up-to-date, spending hundreds of thousands of dollars. Meanwhile, our AWS team, consisting of only three people working with Terraform, proved far more cost-effective. Not in terms of dollars per CPU core, but in terms of average per project spending dollars once staff costs and everything were included.

I think many of the HN posts that say things like "I saved 90% of my infra bill by moving from AWS to a single Hetzner server" are a bit misleading.

andersmurphy · 3h ago
Most of those things you listed are work arounds for having a slow server/system.

For example, if you serve your assets from the server you can skip a cors round trip. If you use an embedded database like sqlite you can shave off 50ms, use dedicated CPU (another 50ms), now you don't need to sever anything from the edge. Because your global latency is much better.

Managing a single VPS is trivial compared to AWS.

talles · 4h ago
Don't forget the cost of managing your one big server and the risk of having such single point of failure.
Puts · 4h ago
My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures. One server is the simplest and most reliable setup, and if you have backup and automated provisioning you can just re-deploy your entire environment in less than the time it takes to debug a complex multi-server setup.

I'm not saying everybody should do this. There are of-course a lot of services that can't afford even a minute of downtime. But there is also a lot of companies that would benefit from a simpler setup.

motorest · 4h ago
> My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures.

I think you misread OP. "Single point of failure" doesn't mean the only failure modes are hardware failures. It means that if something happens to your nodes whether it's hardware failure or power outage or someone stumbling on your power/network cable, or even having a single service crashing, this means you have a major outage on your hands.

These types of outages are trivially avoided with a basic understanding of well-architected frameworks, which explicitly address the risk represented by single points of failure.

fogx · 3h ago
don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner? and even if, you could just run a provisioned secondary server that jumps in if the first becomes unavailable and still be much cheaper.
toast0 · 3h ago
I don't know about Hetzner, but the failure case isn't usually tripping over power plugs. It's putting a longer server in the rack above/below yours and pushing the power plug out of the back of your server.

Either way, stuff happens, figuring out what your actual requirements around uptime, time to response, and time to resolution is important before you build a nine nines solution when eight eights is sufficient. :p

icedchai · 3h ago
It's unlikely, but it happens. In the mid 2000's I had some servers at a colo. They were doing electrical work and took out power to a bunch of racks, including ours. Those environments are not static.
talles · 4h ago
I also have seem the opposite somewhat frenquently: some team screws up the server and unrelated stable services that are running since forever (on the same server) are now affected due messing up the environment.
ocdtrekkie · 4h ago
My single on-premise Exchange server is drastically more reliable than Microsoft's massive globally resilient whatever Exchange Online, and it costs me a couple hours of work on occasion. I probably have half their downtime, and most of mine is scheduled when nobody needs the server anyhow.

I'm not a better engineer, I just have drastically fewer failure modes.

talles · 4h ago
Do you develop and manage the server alone? It's a quite a different reality when you have a big team.
ocdtrekkie · 3h ago
Mostly myself but I am able to grab a few additional resources when needed. (Server migration is still, in fact, not fun!)
jeffrallen · 3h ago
Not to mention the other leading cause of outages: UPS's.

Sigh.

icedchai · 1h ago
UPSes always seem to have strange failure modes. I've had a couple fail after a power failure. The batteries died and they wouldn't come back up automatically when the power came back. They didn't warn me about the dead battery until after...
joek1301 · 4h ago
wmf · 4h ago
Don't forget to read the article.
chrisweekly · 4h ago
I'll take a (lone) single point of failure over (multiple) single points of failure.
justmarc · 2h ago
AWS has also been a single point of failure multiple times in history, and there's no reason to believe this will never happen again.
api · 2h ago
I’ve found that it’s hard to even hire engineers who aren’t all in on cloud and who even know how to build without it.

Even the ones who do know have been conditioned to tremble with fear at the thought of administrating things like a database or storage. These are people who can code cryptography kernels and network protocols and kernel modules, but the thought of running a K8S cluster or Postgres fills them with terror.

“But what if we have downtime!” That would be a good argument if the cloud didn’t have downtime, but it does. Most of our downtime in previous years has been the cloud, not us.

“What if we have to scale!” If we are big enough to outgrow a 256 core database with terabytes of SSD, we can afford to hire a full time DBA or two and have them babysit a cluster. It’ll still be cheaper.

“What if we lose data?” Ever heard of backups? Streaming backups? Hot spares? Multiple concurrent backup systems? None of this is complex.

“But admin is hard!” So is administrating cloud. I’ve seen the horror of Terraform and Helm and all that shit. Cloud doesn’t make admin easy, just different. It promised simplicity and did not deliver.

… and so on.

So we pay about 1000X what we should pay for hosting.

Every time I look at the numbers I curse myself for letting the camel get its nose under the tent.

If I had it to do over again I’d forbid use of big cloud from day one, no exceptions, no argument, use it and you’re fired. Put it in the articles of incorporation and bylaws.

matt-p · 1h ago
I have also found this happening. It's actually really funny because I think even I'm less inclined to run postgres myself these days, when I used to run literally hundreds of instances with not much more than PG_DUMP, cron and two read only replicas.

These days probably the best way of getting these 'cloudy' engineers on board is just to tell them its Kubernetes and run all of your servers as K3s.

api · 1h ago
I’m convinced that cloud companies have been intentionally shaping dev culture. Microservices in particular seem like a pattern designed to push managed cloud lock in. It’s not that you have to have cloud to use them, but it creates a lot of opportunities to reach for managed services like event queues to replace what used to be a simple function call or queue.

Dev culture is totally fad driven and devs are sheep, so this works.

matt-p · 42m ago
Yeah I think that's fair. I'm very pro containers though, that's a genuine step forward from deploy scrips or vm images.
johnklos · 2h ago
These days we have more meta-software than software. Instead of Apache with virtualhosts, we have a VM running Docker instances, each with an nginx of its own, all connected by a separate Docker of nginx acting as a proxy.

How much waste is there from all this meta-software?

In reality, I host more on Raspberry Pis with USB SSDs than some people host on hundred-plus watt Dells.

At the same time, people constantly compare colo and hardware costs with the cost per month of cloud and say cloud is "cheaper". I don't even bother to point out the broken thinking that leads to that. In reality, we can ignore gatekeepers and run things out of our homes, using VPSes for public IPs when our home ISPs won't allow certain services, and we can still have excellent uptimes, often better than cloud uptimes.

Yes, we can consolidate many, many services in to one machine because most services aren't resource heavy constantly.

Two machines on two different home ISP networks backing each other up can offer greater aggregate uptime than a single "enterprise" (a misnomer, if you ask me, if you're talking about most x86 vendors) server in colo. A single five minute reboot of a Dell a year drops uptime from 100% to 99.999%.

Cloud is such bullshit that it's exhausting even just engaging with people who "but what if" everything, showing they've never even thought about it for more than a minute themselves.

qaq · 3h ago
and now consider 6th Gen EPYC will have 256 cores also you can have 32 hot-swap SSDs with like 10mil plus of random write IOPS and 60mil plus random read IOPS in a single 2U box
jeffrallen · 3h ago
I work for a cloud provider and I'll tell you, one of the reasons for the cloud premium is that it is a total pain in the ass to run hardware. Last week I installed two servers and between them had four mysterious problems that had to be solved by reseating cards, messing with BIOS settings, etc. Last year we had to deal with a 7 site, 5 country RMA for 150 100gb copper cables with incorrect coding in their EEPROMs.

I tell my colleagues: it's a good thing that hardware sucks: the harder it is to run bare metal, the happier our customers are that they choose the cloud. :)

(But also: this is an excellent article, full of excellent facts. Luckily, my customers choose differently.)

Nextgrid · 1h ago
Fortunately, companies like Hetzner/OVH/etc will handle all this bullshit for you for a flat monthly fee.
garganzol · 3h ago
And then boom, all your services are gone due to a pesky capacitor on the motherboard. Also good luck trying to change even one software component of that monolith without disrupting and jeopardizing the whole operation.

While it is a useful advice to some people in certain conditions, it should be taken with a grain of salt.

fragmede · 3h ago
That capacitor thing hasn't been true since the 90's.
icedchai · 3h ago
Capacitor problem or not, hardware does fail. Power supplies crap out. SSDs die in strange ways. A failure of a supposedly "redundant" SSD might cause your system to freeze up.
garganzol · 3h ago
Hardware still fails. It isn't a question of "if", it's a question of "when". Nothing lasts forever, the naivety lasts only so long too.
cortesoft · 4h ago
> Part of the "cloud premium" for load balancers, serverless computing, and small VMs is based on how much extra capacity your cloud provider needs to build in order to handle their peak load. You're paying for someone's peak load anyway!

Eh, sort of. The difference is that the cloud can go find other workloads to fill the trough from off peak load. They won’t pay as much as peak load does, but it helps offset the cost of maintaining peak capacity. Your personal big server likely can’t find paying workloads for your troughs.

I also have recently come to the opposite conclusion for my personal home setup. I run a number of services on my home network (media streaming, email, a few personal websites and games I have written, my frigate NVR, etc). I had been thinking about building out a big server for expansion, but after looking into the costs I bought 3 mini pcs instead. They are remarkably powerful for their cost and size, and I am able to spread them around my house to minimize footprint and heat. I just added them all to my home Kubernetes cluster, and now I have capacity and the ability to take nodes down for maintenance and updates. I don’t have to worry about hardware failures as much. I don’t have a giant server heating up one part of my house.

It has been great.