One of the more detrimental aspects of the Cloud Tax is that it constrains the types of solutions engineers even consider.
Picking an arbitrary price point of $200/mo, you can get 4(!) vCPUs and 16GB of RAM at AWS. Architectures are different etc., but this is roughly a mid-spec dev laptop of 5 or so years ago.
At Hetzner, you can rent a machine with 48 cores and 128GB of RAM for the same money. It's hard to overstate how far apart these machines are in raw computational capacity.
There are approaches to problems that make sense with 10x the capacity that don't make sense on the much smaller node. Critically, those approaches can sometimes save engineering time that would otherwise go into building a more complex system to manage around artificial constraints.
Yes, there are other factors like durability etc. that need to be designed for. But going the other way, dedicated boxes can deliver more consistent performance without worries of noisy neighbors.
benreesman · 15h ago
In 2025 if you need convenience and no red tape you've got fly.io in the general case and maybe Vercel or something on a particular framework (there are some good ones for a particular stack).
If your needs go beyond that? Then you need real computers with real configuration and you have OVH/Hetzner/Latitude who will rent you MONSTER machines for the cost of some cheap-ass surplus 2017 Intel on The Cloud.
And if you just want a blog or whatever? Zillion VPS options.
The traditional cloud is for regulatory/process/corruption capture extraction in 2025: its machine economics and developer productivity use case is fucking zero I've seen. Maybe there's some edge case where a completely unencumbered team is better off with DMV trip permissions theatre, remnant Intel racked with noisy neighbors at massive markup, and no support recourse.
nine_k · 4h ago
(1) How does fly.io reliability compare to AWS, GCP, or maybe Linode or DO?
(2) What do you do if your large Hetzner server starts to show signs of malfunction? How soon would you be able to replace it, and how easily?
(2a) What do you do when your large Hetzner server just dies? I see that this happens rarely, but what's your contingency plan, if any?
(3) What do you do when your load is highly spiky? Do you reserve bare metal capacity for the biggest peak you expect to serve, because it's so much cheaper than running an elastic serverless architecture of the same capacity anyway?
(4) Considering that your stack still includes many components, how do you manage them, and how expensive is the management overhead? Do you need an extra SRE?
These are not rhetorical questions; I'd love to hear firm real practitioners! (E.g. Stack Overflow used to do deep dives into their few-big-servers architecture.)
runako · 1h ago
These are great questions.
A key factor underlining all of this is understanding, from a business/organizational perspective, your actual uptime requirements. Google may aim at 5 nines with the budget to achieve it, but many banks have routine planned downtime. If you don't know your objectives, you will have trouble making the tradeoffs necessary to get there. As a hypothetical, would your business choose 99.999% uptime (26 seconds down on average per month) vs 99.99% (4.3 min) if that caused infra costs to rise by 50% or more? If you said we can cut our infra costs by 50% by planning a short weekly maintenance window, how would that resonate?
Speaking to a few, in my experience:
2) (not at Hetzner specifically, but at a dedicated host). You have backups & recovery plans, and redundancy where it makes sense. You might run your database with a replica. If you are serving Web traffic, maybe you keep a hot spare. Also, you are still allowed to use e.g. cloud services if it makes sense to do so so you can backup to S3 and use things like SQS or KMS if you don't want to run them yourself. It's worth noting that you may not get advance notice; I recall our service being impacted by a fire at a datacenter that IIRC was caused by a traffic accident on a nearby highway. The point is you have to design resilience into the system. Fortunately, this is well-trod ground.
It would not be a terrible failover option to have something like an autoscale group at AWS ready to step in if the dedicated cluster goes offline. Keep that cluster scaled to 0 until it's needed. Put the cloud behind your cheap dedicated capacity.
3) See above. In my case, we over-provisioned because it's cheap to do so. I did not do this at the time, but I would probably look at running a replicated database with a hot standby on another server.
4) It has not been my experience that "modern" cloud deployments require fewer SRE resources. Like water running downhill, cloud projects seek complexity.
shrubble · 1d ago
It's more than that - it's all the latency that you can remove from the equation with your bare-metal server.
No network latency between nodes, less memory bandwidth latency/contention as there is in VMs, no caching architecture latency needed when you can just tell e.g. Postgres to use gigs of RAM and then let Linux's disk caching take care of the rest (and not need a separate caching architecture).
matt-p · 1d ago
The difference between a fairly expensive ($300) RDS instance + EC2 in the same region vs a $90 dedicated server with a NVME drive and postgres in a container is absolutely insane.
bspammer · 1d ago
A fair comparison would include the cost of the DBA who will be responsible for backups, updates, monitoring, security and access control. That’s what RDS is actually competing with.
shrubble · 1d ago
Paying someone $2000 to set that up once should result in the costs being recovered in what, 18 months?
If you’re running Postgres locally you can turn off the TCP/IP part; nothing more to audit there.
SSH based copying of backups to a remote server is simple.
If not accessible via network, you can stay on whatever version of Postgres you want.
I’ve heard these arguments since AWS launched, and all that time I’ve been running Postgres (since 2004 actually) and have never encountered all these phantom issues that are claimed as being expensive or extremely difficult.
sahilagarwal · 14h ago
I guess my non-management / non-business side is show here, but how can it be that much?? I still remember I designed a fairly simple cron job that took database backups when I was a junior developer.
It gets even easier now that you have cheap s3 - just upload the dump to s3 every day and set the s3 deletion policy to whatever is feasible for you.
alemanek · 12h ago
I am not an expert here but I am currently researching for a planned project.
For backups, including Postgres, I was planning on paying Veeam ~$500 a year for a software license to backup the active node and Postgres database to s3/r2. Standby node would be getting streaming updates via logical replication.
There are free options as well but I didn’t want to cheap out on the backups.
It looks pretty turnkey. I am a software engineer not a sysadmin though. Still just theory as well as I haven’t built it out yet
nine_k · 4h ago
Taking database backups is relatively simple. What differentiates a good solution is the ease of restoring from a backup. This includes the certainty that the restored state would be a correct point-in-time state from the past, not an amalgamation of several such states.
fragmede · 13h ago
How much were you paid as a jr developer, and how long did it take you to set up? Then round up to mid-level developer, and add in hardware and software costs.
dijit · 12h ago
That's a deflection. The question isn't about a developer's salary; it's about the fundamental difference between a one-time investment and a permanent cost.
Either way: 1 day of a mid-level developer in the majority of the world (basically: anywhere except Zurich, NYC or SF) is between €208 and €291. (Yearly salary of €50-€70k)
A junior developer's time for setup and the cost of hardware is practically a one-off expense. It's a few days of work at most.
The alternative you're advocating for (a recurring SaaS fee) is a permanent rent trap. That money is gone forever, with no asset or investment to show for it. Over a few years, you'll have spent tens of thousands of dollars for nothing. The real cost is not what you pay a developer; it's what you lose by never owning your tools.
fragmede · 2h ago
> The alternative you're advocating for
Not sure where I advocated for that. Could you point it out please?
applied_heat · 1d ago
$2k? That’s a $100k project for a medium size Corp
christophilus · 16h ago
$200 does seem too low. $100k seems waaay too high. That sounds like an AWS talking point.
sysguest · 21h ago
hmm where did you get the numbers?
(what's "medium-size corp" and how did you come up with $100k ?)
Aeolun · 18h ago
I’m assuming he’s talking about the corporate team of DBA’s that will spend weeks discussing the best way to copy a bunch of SQL files to S3
vidarh · 1d ago
I do consulting in this space, and we consistently make more money from people who insist on using cloud services, because their setups tend to need far more work.
benterix · 16h ago
Similar here - but in my case the reason is because of vendor lock-in - they spent years getting into AWS and any thought of getting out seems dreadful.
kiney · 17h ago
same for me
benterix · 16h ago
You are aware that RDS needs backups, setting up monitoring properly, defining access, providing secrets management etc., and updates between major versions are not automatic?
RDS has a value. But for many teams the price paid for this value is ridiculously high when compared to other options.
pdhborges · 10h ago
AWS can make major version upgrades automatically now with less downtime. I think they do the logical replication dance internally.
yjftsjthsd-h · 1d ago
As long as you also include the Cloud Certified DevOps Engineer™[0] to set up that RDS instance.
[0] A normal sysadmin remains vaguely bemused at their job title and the way it changes every couple years.
mrweasel · 19h ago
It's also interesting that the cloud engineer can apparently be both a DBA, network-, storage- and backup engineer, but if you move the same services on-prem, you apparently need specialists for each task.
Sometimes even the certified cloud engineers can't tell you why an RDS behaves the way it does, nor can they really fix it. Sometimes you really do need a DBA, but that applies equally to on-prem and cloud.
I'm a sysadmin, but have been labelled and sold as: Consultant (sounds expensive), DevOps engineer, Cloud Engineer, Operations Expert and right now a Site Reliability Engineer.... I'm a systems administrator.
icedchai · 15h ago
I haven't seen a company that hired DBAs in over 15 years.
I think the "DevOps" movement sent them packing, along with SysAdmins.
dijit · 6h ago
Sysadmins never left, they just got rebranded.
Aeolun · 18h ago
If you’ve started working in the industry more than about 15 years ago all the titles sound quaint.
data_marsupial · 15h ago
Need to get Platform Engineer for a full house
Cthulhu_ · 19h ago
While that's fair, most organizations I've worked at in the past decade have had a dedicated team for managing their cloud setup, which is also responsible for backups, updates, monitoring, security and access control. I don't think they're competing.
sgarland · 1d ago
You don’t need a DBA for any of those, you need someone who can read some docs. It’s not witchcraft.
Aeolun · 18h ago
I’d argue that AWS is witchcraft a lot of the time. They’ll have all these they claim will work for everything, but you’ll always find one of the things you’d expect to be unavailable.
lelanthran · 21h ago
The RDS solution doesn't need a technical person to set it up?
It doesn't need someone who knows how to use the labrythine AWS services and console?
whstl · 13h ago
Agree.
These comments sound super absurd to me, because RDS is difficult as hell to setup, unless you do it very frequently or already have it in IoC format, since one needs setting up a VPC, subnets, security groups, internet gateway, etc.
It's not like creating a DynamoDB, Lambda or S3 where a non-technical person can learn it in a few hours.
Sure, one might find some random Terraform file online to do this or vibe-code some CloudFormation, but that's not really a fair comparison.
matt-p · 1d ago
Totally. My frustration isn't even price though RDS is literally just dog slow.
steveBK123 · 15h ago
My firm paid DBAs for RDS as well so..
zenmac · 1d ago
Yeah but AWS SRE are what making the big bucks! Soooo what can you do? It is nice to see many people here on HN are supporting open network and platform and making very drastic comments as to encouraging google engineers to quite their jobs.
I totally also understand why some people with family to support mortgage to pay they can't just walk way from a job at FAANG or MAMAA type place.
Looking at your comparison, this point it just seems like a scam.
jpgvm · 1d ago
Right now the big bucks are in managing massive bare metal GPU clusters.
benterix · 16h ago
Yeah, let's use the opportunity while it lasts.
reactordev · 1d ago
This. Clustering and managing Nvidia at scale is the new hotness demanding half-million dollar salaries.
t_mahmood · 17h ago
I don't get why people are so hell-bent on going to AWS, for the most minor applications, without looking at simpler options!
I am not even thousands km near the level of what you are doing, but my client was paying $100/m for an AWS server, SQS and S3 bucket, for a small PHP based web application that uses Amazon Seller API, Keepa API for the products he ships. Used MySQL for data storage.
I implemented the whole thing in Python, Django, and PostgreSQL (initially used SQLite) put it in a $25/m unmanaged VPS.
I have not got any complaints about performance, and it's running continuously updating product prices, details, processing PDF invoices using OCR, finding missing products in shipments, while also serving the website, and a 4 core server with 6GB RAM is handling it just fine.
The load is not going to be so high to require AWS and friends, for now. It's a small internal app, probably won't even get over 100 users, and if it ever does, it's extremely simple to migrate, because the app is so compact, even though not exactly monolithic.
And still, it probably won't need a $100 AWS server, unless we are scaling up much larger.
jeroenhd · 16h ago
AWS is useful for big business. Automatic multi region failover and hosted databases may be expensive, but they're a massive pain to manually configure and an easy footgun if you're not used to doing that sort of thing. Plus, with Amazon you already have public toolkits to use those features with all of your services, so you don't need to figure how to integrate/what open source system to use to accomplish all of that. Plus, if you go for your own physical server, you need to arrange parts and maintenance windows for any hardware that will eventually fail.
If all you need is "good enough" reliability and basic compute power (which I think is good enough for many businesses, considering AWS isn't exactly outage free either), you're probably better off getting a server or renting one from a cheap cloud host. If you're promising five nines of uptime for some reason, you may want to reconsider.
t_mahmood · 15h ago
> If all you need is "good enough" reliability and basic compute power (which I think is good enough for many businesses, considering AWS isn't exactly outage free either), you're probably better off getting a server or renting one from a cheap cloud host.
This is exactly my point. Sorry if I was not clear on my OP.
We are using Seller API to get different product information, while their API provides base work for communicating with their endpoint, you'll have to implement your own system to use that, and handle the absurd unreliability of their API's rate limiter, and the spider web of API callbacks to get information that you require.
Esophagus4 · 13h ago
Without understanding the architecture and use case better, at first read, my gut says that isn’t an AWS problem - it sounds like a solutions architecture problem.
There are cheaper ways of building that use case on AWS.
Most AWS sticker shock I’ve seen results from someone who doesn’t really understand cloud trying to build on the cloud. Cost has to be designed in from the start (in addition to security, operational overhead, etc).
In general, I’ve found two types of engineering teams who don’t use the cloud: the mugs and the superstars. And since superstars are few and far between, that means…
dijit · 12h ago
Sounds like we need a specialist.
I guess those promises about needing fewer expensive people never materialised.
tbh, aside from the really anaemic use-cases where everything actually manages to scale to zero and has very low load: I have genuinely never seen an AWS project (outside of free credits of course) that works out cheaper than what came before.
That's TCO from PNLs, not a "gut feeling". We have a decade of evidence now.
t_mahmood · 11h ago
... you failed at reading comprehension?
My comment was not about using AWS is bad, it has its uses. My comment was about how in this instance it was simply not needed. And I even speculated when it might be needed.
To pick the correct tool for the job, is what, it means to be an Engineer, or a person with common sense. With experience, we can get over childish absolutions of a tool or service, and look at the broader aspects, unless, of course, we are expecting some kind of monetary gains.
choeger · 17h ago
How much did that reimplementing cost and when will the savings exceed that cost?
t_mahmood · 16h ago
This costed around $10k. Which also includes work that is outside the reimplementation.
I do not know how much actually cost of the original application.
The app, that I was developing, was for another purpose, and the reimplementation was later added.
The app replaces an existing commercial app that is being used, which is $200+/m. So, may be 4/5 years to exceed the savings. They have been using the app for 3 years, I think.
And, maybe I am beating my drum a little, I believe my implementation works, and looks much better than the commercial or the first implementation.
So, I am really looking forward for this to success.
3shv · 17h ago
What are some cheaper and better hosting providers that you can recommend?
benterix · 16h ago
Hetzner.
For most public cloud providers you have to give them your credit card number so they can charge an arbitrary amount.
For Hetzner, instead of CC#, you give a scan of your ID (of course you can attach your CC too or Paypal). Personally I do my payments via a bank transfer. I recently paid for the whole 2025 and 2026 for all my k8s clusters. It gives unimaginable peace of mind when compared to AWS/GCP/Azure.
Plus, their cloud instances often spin up much faster than EC2.
drewnick · 11h ago
For bare metal I’ve been using tier.net to get 192 GB RAM, 4TB NVME and 32 cores for $219/mo.
Data centers all over the country and I get to locate under 10ms from my regional audience.
Just a data point if you want some bigger iron than a VM.
t_mahmood · 16h ago
I have used Knownhost previously, it served me really well.
Before that, I used to go for Linode, but I think they've become more pricey?
LamaOfRuin · 11h ago
Linode was bought by Akamai. They immediately raised prices, and they have been, if anything, less reliable.
t_mahmood · 11h ago
Ahh, yes, I remember now! I think it's almost 8 years now? Stopped using them after the buy out.
Too bad, actually, their service was pretty good.
ferngodfather · 17h ago
Hetzner! They do ask for ID though.
mr_toad · 13h ago
Saving $75 a month at what cost in labour?
andersmurphy · 11h ago
You actually save on labour. A VPS is a lot less work than anything involving AWS console.
andersmurphy · 1d ago
100% this add an embedded database like sqlite and optimise writes to batch and you can go really really far with hetzner. It's also why I find the "what about overprovisioning" argument silly (once you look outside of AWS you can get insane cost/perf ratio).
Also in my experience more complex systems tend to have much less reliability/resilience than simple single node systems. Things rarely fail in isolation.
Demiurge · 1d ago
I think it’s the other way around. I’m a huge fan of Hetzner for small sites with a few users. However, for bigger projects, the cloud seems to offer a complete lack of constraints. For projects that can pay for my time, $200/m or $2000/m in hosting costs is a negligible difference.
What’s the development cost difference between AWS CDK / Terraform + GitHub Actions vs. Docker / K8s / Ansible + any CI pipeline? I don’t know; in my experience, I don’t see how “bare metal” saves much engineering time. I also don’t see anything complicated about using an IaC Fargate + RDS template.
Now, if you actually need to decouple your file storage and make it durable and scalable, or need to dynamically create subdomains, or any number of other things… The effort of learning and integrating different dedicated services at the infrastructure level to run all this seems much more constraining.
I’ve been doing this since before the “Cloud,” and in my view, if you have a project that makes money, cloud costs are a worthwhile investment that will be the last thing that constrains your project. If cloud costs feel too constraining for your project, then perhaps it’s more of a hobby than a business—at least in my experience.
Just thinking about maintaining multiple cluster filesystems and disk arrays—it’s just not what I would want to be doing with most companies’ resources or my time. Maybe it’s like the difference between folks who prefer Arch and setting up Emacs just right, versus those happy with a MacBook. If I felt like changing my kernel scheduler was a constraint, I might recommend Arch; but otherwise, I recommend a MacBook. :)
On the flip side, I’ve also tried to turn a startup idea into a profitable project with no budget, where raw throughput was integral to the idea. In that situation, a dedicated server was absolutely the right choice, saving us thousands of dollars. But the idea did not pan out. If we had gotten more traction, I suspect we would have just vertically scaled for a while. But it’s unusual.
runako · 1d ago
> I really don't see how "bare metal" saves any engineering time
This is because you are looking only at provisioning/deployment. And you are right -- node size does not impact DevOps all that much.
I am looking at the solution space available to the engineers who write the software that ultimately gets deployed on the nodes. And that solution space is different when the nodes have 10x the capability. Yes, cloud providers have tons of aggregate capability. But designing software to run on a fleet of small machines is very different from accomplishing the same tasks on a single large machine.
It would not be controversial to suggest that targeting code at an Apple Watch or Raspberry Pi imposes constraints on developers that do not exist when targeting desktops. I am saying the same dynamic now applies to targeting cloud providers.
This isn't to say there's a single best solution for everything. But there are tradeoffs that are now always apparent. The art is knowing when it makes sense to pay the Cloud Tax, and whether to go 100% Cloud vs some proportion of dedicated.
sevensor · 16h ago
I’ve seen multiple projects founder on the complexity of writing software for the cloud. Moving data from here to there ends up being way harder than anybody expected. Maybe teams with more experience build this into their planning, but from what I’ve seen, if you’re using the cloud, your solution ends up being 95% about getting data where it’s supposed to be and 5% application logic.
Esophagus4 · 13h ago
This sounds a people problem, not a technology problem.
I’ve never had an issue with moving data.
Demiurge · 1d ago
Overall, I agree that most people underestimate the runway that the modern dedicated server can give you.
benterix · 16h ago
> I’m a huge fan of Hetzner ... I don’t see how “bare metal” saves much engineering time.
I think you confuse Heztner with bare metal. Hetzner has Hetzner Cloud which is like AWS EC2 but much cheaper. (They also have bare metal servers which are even cheaper.) With Heztner Cloud, you can use Terraform, Github Actions and whatever else you mentioned.
Demiurge · 11h ago
Yeah, I do confuse it, because I've been using Hetzner long before they had "cloud".
cnst · 9h ago
> types of solutions engineers even consider
I think the issue is actually the opposite.
With the cloud, the engineers fail to see the actual cost of their inefficient scaled-out code, because someone else (the CFO) pays the bill; and the answer to any issue, is simply adding more "workers" and more "cloud", since they're basically "free" from the perspective of the employee. (And the more "cloud" something is, like, the serverless, the more "free", completely inverting the economics of making a profit on the service — when the CFO tells you that your AWS bill is too high, you move everything from the EC2 to AWS Lambda, since the salesperson from AWS tells you that serverless is far cheaper, only for the bill to get even higher, for reasons unknown, of course.)
Whom the cloud tax actually constrains are the entrepreneurs and solo-preneurs. If you have to pay $5000/mo to AWS just for the infra, you can only go so long without lots of revenue, and you'd need to have a whopping 5k/mo+ worth of revenue before breaking even. Yet with a $200/mo like at OVH or Hetzner, you can afford to let it grow at negligible cost to yourself, and it can basically start being profitable with the first few users.
Don't believe this? Look at the blog entries by the guy who bought Yahoo!'s Delicious, written before they went bankrupt and were up for sale. He was basically pointing out that the services have roughly the same number of users, and require the same engineering resources, yet one is being operated at a loss, whereas the other one makes a profit (guess which one, and guess why).
So, literally, the difference between the cloud and renting One Big Server, is making a loss and going out of business, and remaining in business and purchasing your underwater competitor for pennies on the dollar.
ldoughty · 1d ago
I agree that AWS EC2 is probably too expensive on the whole. It also doesn't really provide any of the greater benefits of the cloud that come from "someone else's server".
However, to the point of microservices as the article mentions, you probably should look at lambda (or fargate, or a mix) unless you can really saturate the capacity of multiple servers.
When we swapped to ECS+EC2 running microservices over to lambda our costs dropped sharply. Even serving millions of requests a day we spend a lot of time in between idle, especially spread across the services.
Additionally, we have 0 outages now from hardware in the last 5 years. As an engineer, this has made my QoL significantly better.
jgalt212 · 15h ago
> I agree that AWS EC2 is probably too expensive on the whole.
Probably? It's about 5-10X more expensive than equivalent services from Hetzner.
cedws · 18h ago
I don’t disagree but “cores” is not a good measure of computational power.
christophilus · 16h ago
True, but the cores on a dedicated Hetzner box obliterate the cores on an EC2 machine every time I’ve tested them. So, if anything, it understates the massive performance gap.
andersmurphy · 11h ago
Hetzner also tends to have more modern SSDs with the latest nvme. Which can make a massive difference for your DB.
Nextgrid · 8h ago
It's less about the modernity of SSDs and more about a fundamental difference: all persistent storage on AWS is actually networked - it's exposed to you as NVME but it's actually on a SAN and all IO requests go over the network.
You can get actual direct-attached SSDs on EC2 (and I'd expect performance to be on-par with Hetzner), but those are ephemeral and you lose them on reboot.
Spooky23 · 1d ago
It really depends on what you are doing. But when you factor the network features, the ability to scale the solution, etc you get alot of stuff inside that $200/mo EC2 device. The product is more than the VM.
That said, with a defined workload without a ton of variation or segmentation needs there are lots of ways to deliver a cheaper solution.
troupo · 23h ago
> you get alot of stuff inside that $200/mo EC2 device. The product is more than the VM.
What are you getting, and do you need it?
throwaway7783 · 22h ago
Probably not for $200/mo EC2, but AWS/GCP in general
* Centralized logging, log search, log based alerting
* Secrets manager
* Managed kubernetes
* Object store
* Managed load balancers
* Database HA
* Cache solutions
...
Can I run all these by myself? Sure. But I'm not in this business. I just want to write software and run that.
And yes, I have needed most of this from day 1 for my startup.
For a personal toy project, or when you reach a certain scale, it may makes sense to go the other way. U
eska · 21h ago
Now imagine your solution is not on a distributed system and go through that list. Centralized logging? There is nothing to centralized. Secrets management? There are no secrets to be constantly distributed to various machines on a network. Load balancing? In practice most people for most work don’t use it because of actually outgrowing hardware, but because they have to provision to shared hardware without exclusivity. Caching? Distributed systems create latency that doesn’t need to exist at all, reliability issues that have to be avoided, thundering herd issues that you would otherwise not have, etc.
So while there are areas where you need to introduce distributed systems, this repeated disparaging comment of “toy hobby projects” makes me distrust your judgement heavily. I have replaced many such installations by actually delivering (grand distributed designs often don’t fully deliver), reducing costs, dramatically improving performance, and most importantly reducing complexity by magnitudes.
bbarnett · 19h ago
Not to mention scaling. Most clients I know never, ever have scaled once. Ever. Or if they do, it's to save money.
One server means you can handle the equiv of 100+ AWS instances. And if you're into that turf, then having a rack of servers saves even more.
Big corp is pulling back from the cloud for a reason.
viraptor · 14h ago
> Centralized logging? There is nothing to centralized.
It's still useful to have the various services, background jobs, system events, etc. in one indexed place which can also manage retention and alerting. And ideally in a place reachable even if the main service goes down. I've got centralised logging on a small homelab server with a few services on it and it's worth the effort.
> Load balancing? In practice most people for most work don’t use it because of actually outgrowing hardware, but because they have to provision to shared hardware without exclusivity.
Depending on how much you lose in case of downtime, you may want at least 2x of hardware for redundancy and that means some kind of fancy routing (whether it's LB, shared IP, or something else)
> Secrets management? There are no secrets to be constantly distributed to various machines on a network.
Typically businesses grow to more than one service. For example I've got a slack webhook in 3 services in a small company and I want to update it in one place. (+ many other credentials)
> Caching? Distributed systems create latency that doesn’t need to exist at all
This doesn't solve the need for caching results of larger operations. It doesn't matter how much latency you have or not, you still don't want that rarely-changing 1sec long query to run on every request. Caching is rarely only about network latency.
Spooky23 · 13h ago
It sounds like you make a living doing stuff that has an incredibly small, ninja-like team, has a very low change rate, or is something that nobody really cares about. Things like RPO/RTO, multi-tenancy, logging, etc don't matter.
That's amazing. I wish I could do the same.
Unfortunately, I cannot run my business on a single server in a cage somewhere for a multitude of reasons. So I use AWS, a couple of colos and SaaS providers to deliver reliable services to my customers. Note I'm not a dogmatic AWS advocate, I seek out the best value -- I can't do what I do in AWS without alot of capital spend on firewalls and storage appliances, as well as the network infrastructure and people required to make those work.
doganugurlu · 21h ago
You need database HA and load balancers on day 1?
You must be doing truly a lot of growth prior to building. Or perhaps insisting on tiny VMs for your loads?
swiftcoder · 20h ago
> Or perhaps insisting on tiny VMs for your loads?
This happens way too often. Early-stage startups that build everything on the AWS free tier (t2.micro only!), and then when the time comes they scale everything horizontally
runako · 11h ago
> Centralized logging, log search, log based alerting
Do people really use the bare CloudWatch logs as an answer for log search? I find it terrible and pretty much always recommend something like DataDog or Splunk or New Relic.
troupo · 20h ago
> For a personal toy project,
which in reality is any project under a few hundred thousand users
benjiro · 16h ago
> At Hetzner, you can rent a machine with 48 cores and 128GB of RAM for the same money.
The problem that Hetzner and a lot of hardware providing hosts have, is the lack of affordable flexibility.
Hetzner their design is based upon a base range of standardized products. This can only be upgraded within a pre-approved range of upgrade options (limited to storage/memory).
Upgrades are often a mixed bag of carefully designed "upgrade paths". As you can expect, upgrades are not cheap. Doubling the storage on a base server, often increases the price of your server by 50 to 75%. The typical customizing will cost you dearly.
This is where AWS wins a lot more. Yes, they are expensive as hell, but you often are not stuck to a base config and a limited upgrade path. The ability to scale beyond what Hetzner can offer is there, and your not forced to overbuy from the start. Transferring between servers is a few buttons and done. With Hetzner, if you did not overspec from the start, your going to do those fun server migrations.
The ironic part is, that buying your own hardware and running it yourself, often ends up paying back within a 8~12 month periode (not counting electricity / internet). And you maintain a lot more flexibility.
* You want to use bifurcation, go for it.
* You want to use consumer 4TB nvme's for second layer read storage (what hetzner refuses to offer as they limited those to 2TB and only one a few servers), go for it.
* You want a 10Gbit interlink between your server, go for it. No need to pay a monthly fee! No need to reserve "future space".
* O, you want a 25Gbit, go for it (hetzner = not possible).
* You want 50Gbit ...
* You want to chuck in a few LLM capable GPUs without breaking the bank...
Its ironic that we are 2025 and Hetzner is stil limited to 1Gbit connection on its hardware, when just about any consumer level hardware has 2.5Gbit by default for years.
Your own hardware gives you the flexibility of AWS and the cost saving beyond Hetzner. Maybe its just my environment, but i see more and more smaller to medium companies going back to their own locally run servers. Not even colocation.
The increase in consumer level fiber, what used to be expensive or not available, has opened the doors for businesses. Most companies do not need insane backbones.
The fact that you can get business fiber 10Gbit for a 100 Euro price in some EU countries (of course never the north), is insane. I even seen some folks combining fiber with starlink & 5G as backup in case their fiber fails/is out.
As long as you fit within a specific usage case that is being offered by Hetzner, they are cheap. But its the moment you step outside that comfort zone, ... This is one of Hetzner weaknesses and where AWS or Self hosted comes back.
bluedino · 12h ago
Almost reminds of Rackspace back in...2011
We had a leased server from them, running VMware, and we had Linux virtual machines for our application.
We ran out of RAM. We only had 16 or 32GB at the time. Hey, can we double this? Sure, but our payment would nearly double. How does that make any sense?
If this were a co-located box we owned, I could buy a pair of $125 chips from Crucial (or $250 Dell chips from CDW) and there we go. But we're expected to pay this much more per month?
Their answer was "you can do more with the server so that's what you're paying for"
Storage was a similar situation, we were still on RAID with spinning drives and we wanted to go SSD, not even NVME. Wasn't going to happen. And if we went to a new server we'd have to get all new IP's and stuff. Ugh.
And 10Gb...that was a pipe dream. Costs were insane.
We ended up having to decide between two things:
1. Move to a co-lo and buy a couple servers, ala StackExchange. This is what I wanted to do.
2. Tweak the current application stack, and re-write the next version to run on AWS.
What did we end up doing? Some half ass solution using the existing server for DB and NGINX proxy, while running the sites on (very slow) Slicehost instances (which Rackspace had recently acquired and roughly integrated into their network). So we still had downtime issues, slow databases, etc.
radiator · 11h ago
> Doubling the storage on a base server, often increases the price of your server by 50 to 75%
For storage, Hetzner does offer Volumes, which you can attach to your VM and you can choose exactly how large you want them to be and are charged separately. But your argument about doubling resources and doubling prices still holds for RAM.
Nextgrid · 7h ago
FYI he's talking about dedicated servers (or "root servers" as they call them).
benjiro · 5h ago
> For storage, Hetzner does offer Volumes, which you can attach to your VM
The argument was about dedicated hardware. But it still holds for VPS.
Have you seen the price of Cloud Storage? ARM VPS 40GB is 4.51 (inc tax), for 40GB storage, your paying 2.10 Euro. So my argument still holds as your paying almost 50% more, just to go from 40GB to 80GB. And that ratio gets worse if your renting higher end VPS, and double your storage on them.
Lets be honest, 53.62 Euro for 1TB of SSD storage in 2025, is ridiculous.
Netcup is at 12 Euro/TB for SSD storage (same speed as the VMS as its just localized storage on the server, not network storage). Fyi: A ARM 6 Core 256GB, at netcup is 6.26 Euro.
Hetzner used to be the market leader and pushed others, but you barely see any new products or upgraded from them anymore. I said it before, if Netcup actually invested into a more modern/scalable VPS solution (instead of their 2010 VPS panels), they will eat a lots of Hetzners clients.
themafia · 1d ago
On AWS if you want raw computational capacity you use Lambda and not EC2. EC2 is for legacy type workloads and doesn't have nearly the same scaling power and speed that Lambda does.
I have several workloads that just invoke Lambda in parallel. Now I effectively have a 1000 core machine and can blast through large workloads without even thinking about it. I have no VM to maintain or OS image to consider or worry about.
Which highlights the other difference that you failed to mention. Hertzner charges a "one time setup" fee to create that VM. That puts a lot of back pressure on infrastructure decisions and removes any scalability you could otherwise enjoy in the cloud.
If you want to just rent a server then Hertzner is great. If you actually want to run "in the cloud" then Hertzner is a non-starter.
solid_fuel · 1d ago
Strong disagree here. Lambda is significantly more expensive per vCPU hour and introduces tight restrictions on your workflow and architecture, one of the most significant being maximum runtime duration.
Lambda is a decent choice when you need fast, spiky scaling for a lot simple self-contained tasks. It is a bad choice for heavy tasks like transcoding long videos, training a model, data analysis, and other compute-heavy tasks.
themafia · 1d ago
> significantly more expensive per vCPU hour
It's almost exactly the same price as EC2. What you don't get to control is the mix of vCPU and RAM. Lambda ties those two together. For equivalent EC2 instances the cost difference is astronomically small, on the order of pennies per month.
> like transcoding long videos, [...] data analysis, and other compute-heavy tasks
If you aren't breaking these up into multiple smaller independent segments then I would suggest that you're doing this wrong in the first place.
> training a model
You're going to want more than what a basic EC2 instance affords you in this case. The scaling factors and velocity are far less of a factor.
runako · 11h ago
This is a great example of what I meant when I said that a part of the Cloud Tax is it constrains the solution space available to developers. In an era where one can purchase, off-the-shelf, a 256-core machine with terabytes of RAM, developers are still counting megabytes(!) of file sizes due to the constraints of AWS.
It should be obvious that this is not the best answer for all projects.
eska · 21h ago
> If you aren't breaking these up into multiple smaller independent segments then I would suggest that you're doing this wrong in the first place.
Care to elaborate?
icedchai · 14h ago
You are expected to work around Lambda limitations because it's the "right way", not because the limitations make things overly complex. /s
jalk · 20h ago
This article (from Nov. 2022) shows that "utilizing Lambda is preferable until Lambda is utilized about 40 to 50 % of the time"
> [Hetzner] charges a "one time setup" fee to create that VM. That puts a lot of back pressure on infrastructure decisions and removes any scalability you could otherwise enjoy in the cloud.
Hetzner Cloud, then! In the US, $0.53/hr / $333.59/mo for 48 vCPU/192GB RAM/960GB NVMe. Includes 8 TB/mo traffic, when 8 TB egress would cost $720 on EC2; more traffic is $1.20/TB when the first tier of AWS egress is $90/TB. No setup fee. Not that it's EC2 but there's clearly flexibility there.
More generally, if you want AWS, you want AWS; if you want servers you have options.
icedchai · 1d ago
That's fine, except for all of Lambda's weird limitations: request and response sizes, deployment .zip sizes, max execution time, etc. For anything complicated you'll eventually you run into all this stuff. Plus you'll be locked into AWS.
themafia · 1d ago
> request and response sizes
If either of these exceed the limitations of the call, which is 6MB or 256kB depending on call type, then you can just use S3. For large distributed task coordination you're going to be doing this anyways.
> deployment .zip sizes
Overlays exist and are powerful.
> max execution time
If your workload depends on long uninterrupted runs of time on single CPUs then you have other problems.
> Plus you'll be locked into AWS.
In the world of serverless your interface to the endpoints and semantics of Lambda are minimal and easily changed.
icedchai · 15h ago
Of course, we can generally work around all these things. The point is it is annoying to do so. It adds friction and further couples you to a proprietary platform.
You're better off using ECS / Fargate for application logic.
matt-p · 1d ago
Very few providers charge setup, some will provision a server within a 90s of an api call.
themafia · 1d ago
Hertzner does on the server the OP was referencing:
If you are scared off by the €80 setup on a server that costs €200 a month, it seems like the setup fee did its intended job no?
ferngodfather · 17h ago
Most providers do for dedicated servers, or make you agree to a fixed term. I don't believe they do the same for VPS / Cloud servers.
benjiro · 15h ago
> I don't believe they do the same for VPS / Cloud servers.
Because its backed into the price. If you run a VPS for a month, you get the listed monthly price. But if you run a VPS for a shorter time, the hourly billing price is a lot more expensive.
The ironic part being, that your better off keeping a VPS active until the end of your month periode (if you already crossed 2/3), then its is to cancel early.
Noticed that few people realize that the hourly price != the monthly price.
matt-p · 1d ago
I don't think that negates the point I was making. Most don't, for example none of the providers on https://www.serversearcher.com/ seem to charge setup.
lachiflippi · 19h ago
Hetzner does not charge any provisioning fees for VMs and never has.
dang · 1d ago
HN uses two—one live and one backup, so we can fail over if there's a hardware issue or we need to upgrade something.
It's a nice pattern. Just don't make them clones of each other, or they might go BLAM at the same time!
Any stats on HN downtime over the years? I remember one or two outages in the last decade or so, but I would guess the uptime is about 99.99%.
dang · 11h ago
We don't specifically track that, no. The worst one was when we went down for (IIRC) a couple days because of a disk failure, I think in Jan 2014. It was after that that we added a failover box.
HN goes down when we restart the server process, usually as part of updating the code - but only for a few seconds. The message "Restarting the server. Shouldn't take long." displays when that is happening.
There are also, to my exasperation, still moments of brownout during certain traffic spikes or moments of obscure resource contention. But these are at least rarer than they used to be.
tgtweak · 15h ago
I've been doing hybrid colo+public cloud for over a decade and it's always been the most cost effective route at a certain scale. That specific break even point is lowering over time with the density and cost effectiveness of hardware.
Sure you need net/infra admins but the software and hardware these days are pretty management friendly and you'll find you still need (often more expensive "cloud") admins so you're not offsetting much management cost there.
Colocation is plentiful and providers often aggregate and resell bandwidth from their preferred carriers.
At one point we were up to 8 dell vrtx clusters and a few SANs, with 500+ VMs from huge msSQL servers to kube clusters the public cloud bill would have been well into the 6 figures even with preferred pricing and reserved instances. Our colocation bill was $2400/mo and that was mostly for power. The one thing that always surprised me was how much faster everything was - every time we had to scale-over into the cloud the public cloud node was noticably slower even for identical CPU generations and vcpu.
You need to be very keen about server deals, updates, support contracts and licenses - but it's really manageable and interconnecting with the cloud is trivial at this point - you can get a "cloud connect" fiber drop to your preferred cloud provider and connect your colo infra to your vpc.
brazzy · 14h ago
Colocation to me means you buy your own hardware and rent only the rack space (and power and connectivity) from the datacenter. Is that really what you're talking about? If so, why do you choose this over renting bare metal servers?
tgtweak · 11h ago
Not always - you can lease your servers from the vendor as well, in which case you're renting the rack space, power and cooling from the datacenter and you're renting the servers from the vendor - most of the leases are designed so you can refresh your hardware every 4-5 years and it's usually still cheaper than renting from a dedicated hosting company.
Once you have an established baseline for your server needs - it's almost always more capital friendly to buy the servers and keep them running for the ~5 reliable years you'll get out of them - usually break even here is 2-3 years vs renting from a provider. If you're running your servers until they fail you'll get 7-10 years out of them, provided the power cost is still worth running them (usually that is also around the 8-10 year mark depending on your power cost).
So there are many reasons you'd buy vs rent - including capital deductions and access to cheap interest rates. You can also get some pretty crazy deals (like 33% of new price) by buying 2-3 year old equipment, then continue to run them for another 4-5 years, which is the lowest cost scenario if you don't need bleeding edge.
brazzy · 6h ago
What about the cost of having people actually go to the datacenter to install hardware, and go again whenever there is a hardware problem, possibly resulting in much longer downtimes than with a rented server?
Especially for the "one (or a few) big server" scenario in the article, that would seem to me a pretty big factor.
tgtweak · 2h ago
At 1 rack scale you're saving ~20-30k/mo in cloud fees - you can hire an excellent sysadmin in the 12-15k/mo range and they can do a lot more than just go to the datacenter as needed.
fragmede · 14h ago
Because it's your hardware in the colo, so if money becomes dire, you can extend the servers lifetime beyond the standard depreciation schedule. Your rented bare metal servers might be slightly cheaper than a respective EC2 instance, but you stop paying that bill, it's gonna go poof, same as the EC2 instance.
lewisjoe · 1d ago
I helped bootstrap a company that made an enterprise automation engine. The team wanted to make the service available as SaaS for boosting sales.
They could have got the job done by hosting the service in a vps with a multi-tenant database schema. Instead, they went about learning kubernetes and drillingg deep into "cloud-native" stack. Spent a year trying to setup the perfect devops pipeline.
Not surprisingly the company went out of business within the next few years.
rixed · 1d ago
> Not surprisingly the company went out of business within the next few years.
But the engineers could find new jobs thanks to their acquired k8s experience.
doganugurlu · 21h ago
Get paid to learn and build your career instead, baby!
joshmn · 1d ago
This is my experience too—there’s too much time wasted trying to solve a problem that might exist 5 years down the road. So many projects and early-stage companies would be just fine either with a PaaS or nginx in front of a docker container. You’ll know when you hit your pain point.
cpursley · 1d ago
Yep, this is why I'm a proponent of paas until the bill actually hurts. Just pay the heroku/render/fly tax and focus on product market fit. Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...
Aeolun · 17h ago
The moment I sign up for a PaaS the bill hurts. I can never get over the fact I can get 1000x more compute for the same price, never mind that I never use it and have to set everything up myself. I’ll just never pay to lock myself in to something so restricted. My dedicated server allows me to do anything I want or need.
cpursley · 16h ago
If you enjoy playing with servers instead of shipping features, enjoy!
DaSHacka · 1d ago
> Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...
I mean, of the two, the PaaS route certainly burns more money, the exception being the rare shop that is so incompetent they can't even get their own infrastructure configured correctly, like in GP's situation.
There are guaranteed more shops that would be better off self-hosting and saving on their current massive cloud bills than the rare one-offs that actually save so much time using cloud services, it takes them from bankruptcy to being functional.
fragmede · 1d ago
> the PaaS route certainly burns more money,
Does it? Vercel is $20/month and Neon starts at $5/month. That obviously goes up as you scale up, but $25/month seems like a fairly cheap place to start to me.
(I don't work for Vercel or Neon, just a happy customer)
cpursley · 1d ago
Yeah, also a happy neon customer - but they can get pricy. Still prefer them over AWS. For compute, Fly is pretty competitive.
theaniketmaurya · 1d ago
I’m using Neon too and upgraded to the scale up version today. Curious, what do you mean rhat they can get pricey?
Aeolun · 17h ago
Like, you keep your server running for a month and you need to pay $255 pricey? I can get about 64 cores of dedicated compute for the price of a single neon compute (4c/16gb) unit.
And that’s before you factor in 500gb of storage.
cpursley · 16h ago
And how much time are you spending babysitting all of this? What’s your upgrade, deploy and rollback story? Because I don’t have to even think about these things.
fragmede · 1d ago
Yeah, same. Vercel + Neon and then if you actually have customers and actually end up paying them enough money that it becomes significant, then you can refactor and move platforms, but until you do, there are bigger fish to fry.
matt-p · 1d ago
100%. Making it a docker container and deploying it is literally a few hours at most.
AuthAuth · 23h ago
A lot of the time businesses just aren't that important. The amount places I've seen that stress over uptime when nothing they run is at all critical. Hell you could drop the production environment in the middle of the day and yes it would suck and you'd get a few phone calls but life would go on.
These companies all ended up massively increasing their budgets switching to cloud workloads when a simple server in the office was easily enough for their 250 users. Cloud is amazing for some uses and pure marketing BS for others but it seems like a lot of engineers aim for a perfect scalable solution instead of one that is good enough.
ehnto · 18h ago
I had a team member who would reiterate that during tough times. They come from much more consequential work, so they would often remark that at least nobody dies when we fuck up.
winternewt · 18h ago
Every corporate meeting should start with reminding ourselves that we're all going to die. And it most likely won't be from anything happening at the office.
matt-p · 1d ago
A thoroughly good article. It's probably worth also considering adding a CDN if you take this approach at scale. You get to use their WAF and DNS failover.
A big pain point that I personally don't love is that this non-cloud approach normally means running my own database. It's worth considering a provider who also provides cloud databases.
If you go for an 'active/passive' setup, consider saving even more money by using a cloud VM with auto scaling for the 'passive' part.
In terms of pricing the deals available these days on servers are amazing you can get 4GB RAM VPSs with decent CPU and bandwidth for ~$6 or bare metal for ~$90 for 32GB RAM quad core worth using sites like serversearcher.com to compare.
railorsi · 1d ago
What’s the issue with running Postgres inside a docker container + regular backups? Never had problem and relatively easy to manage.
Biganon · 9h ago
Why use a docker container? I run Postgres as is, what would I gain with running it in a container?
Nextgrid · 7h ago
It makes the whole thing is configured in a docker-compose file (or your raw Docker CLI invocation) + the data volume. So as long as you have those two things you can replicate it and move it to other hosts regardless of their distro.
Compare that with using your distro's packaged version where you can have version variations, variations in default config or file path locations, etc.
matt-p · 1d ago
no PITB, but mostly just 'it's hassle' for the application server I literally don't need backups, just automated provisioning/docker container etc. Adding postgres then means I need full backups including PITB because I don't even want to lose an hours data.
doganugurlu · 21h ago
Or use SQLite and your backups are literally a copy of a file.
You can abuse git for it if you really want to cut corners.
vanviegen · 20h ago
Only if you can freeze your application for that long, in which case your statement is true for all non-broken databases.
wild_egg · 12h ago
It only freezes your application if you've misconfigured it.
andersmurphy · 1d ago
If you're running on a single machine then you'll get way more performance with something like sqlite (instead of postgres/MySQL) which also makes managing the database quite trivial.
immibis · 1d ago
SQLite has serious concurrency concerns which have to be evaluated. You should consider running postgres or mysql/mariadb even if it's on the same server.
SQLite uses one reader/writer lock over the whole database. When any thread is writing the database, no other thread is reading it. If one thread is waiting to write, new reads can't begin. Additionally, every read transaction starts by checking if the database has changed since last time, and then re-loading a bunch of caches.
This is suitable for SQLite's intended use case. It's most likely not suitable for a server with 256 hardware threads and a 50Gbps network card. You need proper transaction and concurrency control for heavy workloads.
Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
SQLite is lite. Use it for lite things, not hevy things.
andersmurphy · 18h ago
Not sure what you are talking about? In WAL mode (which is what you should be using) writes don't block reads and reads don't block writes. If you are connections pooling (which you should) the cache will stay hot.
Sqlite (properly configured) will outperform "proper databases" often by an order of magnitude in the context of a single box. You want a single writer for high performance as it lets you batch.
> 256 hardware threads...
Have you tried? I have. Others have too. [1]
> Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
Sqlite has blobs so you can use your own custom encoding which is what you want in a high performance context.
Here's sqlite on a 5$ shared VPS that can handle 10000+ checks per second over a billion checkboxes [2]. You're gonna be fine.
You know, it's ok to say that you're out of your element and don't have direct experience with the thing you're commenting on.
SQLite is easily the best scaling DB tech I've used. I've moved all my postgres workloads over to it and the gains have been incredible.
It's not a panacea and not the best in all cases but it's a very sane default that I recommend everyone start with and only complicate their stack with an external DB when they they start hitting real limits (often never happens)
immibis · 11h ago
> You know, it's ok to say that you're out of your element and don't have direct experience with the thing you're commenting on.
I moved several projects from sqlite to postgres because sqlite didn't scale enough for any of them.
andersmurphy · 11h ago
May I suggest you could have been holding it wrong?
The out of the box defaults for sqlite are terrible for web apps.
hruk · 1d ago
Agree on many things here, but SQLite does support WAL mode which supports 1 writer/N writer readers with snapshot isolation on reads. Writes are serialized but still quite fast.
SQLite (actually SQL-ite, like a mineral) maybe be light, but so are many workloads these days. Even 1000 queries per second is quite doable with SQLite and modest hardware, and I've worked at billion dollar businesses handling fewer queries than that.
Rohansi · 23h ago
Is any SQL database suitable for 50GBps of network traffic hitting it?
Most if not all of your concerns with SQLite are simply a matter of not using the default configuration. Enable WAL mode, enable strict mode, etc. and it's a lot better.
rixed · 1d ago
If you have a single request at a time and need little integrity checks.
rthnbgrredf · 11h ago
Bare-metal servers sound super cheap when you look at the price tag, and yeah, you get a lot of raw power for the money. But once you’re in an enterprise setup, the real cost isn’t the hardware at all, it’s the people needed to keep everything running.
If you go this route, you’ve got to build out your own stack for security, global delivery, databases, storage, orchestration, networking ... the whole deal. That means juggling a bunch of different tools, patching stuff, fixing breakage at 3 a.m., and scaling it all when things grow. Pretty soon you need way more engineers, and the “cheap” servers don’t feel so cheap anymore.
rollcat · 11h ago
A single, powerful box (or a couple, for redundancy) may still be the right choice, depending on your product / service. Renting is arguably the most approachable option: you're outsourcing the most tedious parts + you can upgrade to a newer generation whenever it becomes operationally viable. You can add bucket storage or CDN without dramatically altering your architecture.
Early Google rejected big iron and built fault tolerance on top of commodity hardware. WhatsApp used to run their global operation employing only 50 engineering staff. Facebook ran on Apache+PHP (they even served index.php as plain text on one occasion). You can build enormous value through simple means.
amluto · 11h ago
If you use a cloud, you need a solution for security (ever heard of “shared responsibility”?), global delivery (a big cloud will host you all over, and this requires extra effort on your part, kind of like how having multiple rented or owned servers requires extra effort), storage (okay, I admit that S3 et al are nice and that non-big-cloud solutions are a bit lacking in this department), orchestration (the cloud handles only the lowest level — you still need to orchestrate your stuff on top of it), fixing breakage at 3 a.m. (the cloud can swap you onto a new server, subject to availability; so can a provider like Hetzner. You still need to fail over to that server successfully), patching stuff (other than firmware, the cloud does not help you here).
msgodel · 11h ago
I used to say "oh yeah just run qemu-kvm" until my girlfriend moved in with me and I realized you do legitimately need some kind of infrastructure for managing your "internal cloud" if anyone involved isn't 100% on the same page and then that starts to be its own thing you really do have to manage.
Suddenly I learned why my employer was willing to spend so much on OpenStack and Active directory.
ahdanggit · 10h ago
> until my girlfriend moved in with me
lol, why was this the defining moment? She wasn't too keen on hearing the high pitch wwwwhhhhuuuuurrrrrrr of the server fans?
msgodel · 10h ago
She was another software engineer and needed VMs too so I thought I'd just let her use some of my spare compute.
synack · 20h ago
The complexity you introduce trying to achieve 100% uptime will often undermine that goal. Most businesses can tolerate an hour or two of downtime or data loss occasionally. If you set this expectation early on, you can engineer a much simpler system. Simpler systems are more reliable.
hvb2 · 20h ago
Much less expensive too.
I think in general that expectation is NOT acceptable though especially around data loss. Because the non engineering stakeholders don't believe it is.
Engineers don't make decisions in a vacuum, if you can manage the expectations, good for you. But in most cases that's very much an uphill battle which might make you look incompetent because you cannot guarantee no data loss.
tgtweak · 11h ago
We had single-datacenter resiliency (meaning n+1 on power, cooling, network + isp, servers) and it was fine. You still need offsite DRS strategy here - this is one of the things having that hybrid cloud is great for: you can replicate your critical workloads like databases and services to the cloud in no-load standby, or delta-copy your backups to a cheap cloud provider for simplified recovery in a disaster scenario (ie: entire datacenter gets taken out). The cost of this is relatively low since data into the cloud is free and you're only really incurring costs in a disaster recovery scenario. Most virtualized platforms (veeam etc) support offsite secondary incremental backups with relative ease, recovery is also pretty straightforward.
That being said I've lost a lot of VMs on ec2 and had entire regions go down in gcp and aws in the last 3 years alone, so going to the public cloud isn't a solves it all solution - knock on wood the colo we've been using hasn't been down once in 12+ years.
decasia · 1d ago
Regardless of the cost and capacity analysis, it's just hard to fight the industry trends. The benefits of "just don't think about hardware" are real. I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front). And above all, if an AWS region goes down, it doesn't seem like your org's fault, but if your bespoke private hosting arrangement goes down, then that kinda does seem like your org's fault.
swiftcoder · 20h ago
> I think there is a school of thought that capex should be avoided at all costs
Yep, and it's mostly caused by the VC funding model - if your investors are demanding hockey-stick growth, there is no way in hell a startup can justify (or pay for) the resulting Capex.
Whereas a nice, stable business with near-linear growth can afford to price in regular small Capex investments.
logifail · 1d ago
> and server hardware is expensive up front
You don't need to buy server hardware(!), the article specifically mentions renting from eg Hetzner.
> The benefits of "just don't think about hardware" are real
Can you explain on this claim, beyond what the article mentioned?
bearjaws · 1d ago
> Can you explain on this claim, beyond what the article mentioned?
I run a lambda behind a load balancer, hardware dies, its redundant, it gets replaced. I have a database server fail, while it re provisions it doesn't saturate read IO on the SAN causing noisy neighbor issues.
I don't deal with any of it, I don't deal with depreciation, I don't deal with data center maintenance.
Nextgrid · 1d ago
> I don't deal with depreciation, I don't deal with data center maintenance.
You don't deal with that either if you rent a dedicated server from a hosting provider. They handle the datacenter and maintenance for you for a flat monthly fee.
immibis · 1d ago
They do rely on you to tell them if hardware fails, however, and they'll still unplug your server and physically fix it. And there's a risk they'll replace the wrong drive in your RAID pair and you'll lose all your data - this happens sometimes - it's not a theoretical risk.
But the cloud premium needs reiteration: twenty five times. For the price of the cloud server, you can have twenty-five-way redundancy.
1dom · 19h ago
> And there's a risk they'll replace the wrong drive in your RAID pair and you'll lose all your data - this happens sometimes - it's not a theoretical risk.
A medium to large size asteroid can cause mass extinction events - this happens sometimes - it's not a theoretical risk.
The risk of the people responsible for managing the platform messing up and losing some of your data is still a risk in the cloud. This thread has even already had the argument "if the cloud provider goes down, it's not your fault" as a cloud benefit. Either cloud is strong and stable and can't break, or cloud breaks often enough that people will just excuse you for it.
namibj · 15h ago
There's a reason semiconductor manufacturing is so highly automated, and it's not labor cost.
Humans err. Computers only err when told. But they'll repeat a task reliably without random mistakes if told what to do by a competent (manufacturing process) engineering organization. Yes it takes more than one engineer.
immibis · 11h ago
Many people have already had their data destroyed by remote hands replacing the wrong side of a RAID. Nobody's already had their server destroyed by a mass-extincting meteor.
marcosdumay · 1d ago
> I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front).
Yes, there is.
Honestly, it looks to me that this school of thought is mostly adopted by people that can't do arithmetic or use a calculator. But it does absolutely exist.
That said, no, servers are not nearly expensive enough to move the needle on a company nowadays. The room that often goes around them is, and that's why way more people rent the room than the servers in it.
sam_lowry_ · 1d ago
Connectivity is a problem, not the room.
I ran the IT side of a media company once, and it all worked on a half-empty rack of hardware in a small closet... except for the servers that needed bandwidth. These were colocated. Until we realized that the hoster did not have enough bandwidth, at which point we migrated to two bare metal servers at Hetzner.
marcosdumay · 1d ago
It's connectivity, reliable power, reliable cooling, and security.
The actual space isn't a big deal, but the entire environment has large fixed costs.
sam_lowry_ · 14h ago
In abstract yeah.
In practice, all that except connectivity is relatively easy to have on-site.
Connectivity is highly dependent on the business location, local providers, their business plans and their willingness to go out of their way to serve the clients.
And I am not talking only about bandwidth, but also reserve lines and latency.
grg0 · 9h ago
> if an AWS region goes down, it doesn't seem like your org's fault, but if your bespoke private hosting arrangement goes down, then that kinda does seem like your org's fault.
Never underestimate the price people are willing to pay to evade responsibility. I estimate this is a multi-billion dollar market.
matt-p · 1d ago
If you rent dedicated servers, then you're not worrying about any of the capex or maintenance stuff.
qaq · 1d ago
the benefits of don't write a distributed system unless you really have to are also very real
ehnto · 18h ago
Exactly, same for microservices I feel. Why have enterprise org problems if you don't have an enterprise org.
ehnto · 18h ago
I think you hit the nail on the head. What enterprise are paying for is abstraction of responsibility. Suits would never criticise going with Microsoft or Amazon.
wongarsu · 1d ago
For anything up to about 128GB RAM you can still easily avoid capex by just renting servers. Above that it gets a bit trickier
IshKebab · 1d ago
It's not like it's a huge capex for that level of server anyway. Probably less than the cost of one employee's laptop.
matt-p · 1d ago
Renting (hosted) servers above 128GB RAM is still pretty easy, but I agree pricing levels out. 128GB RAM server ~$200/Month, 384 GB ~$580, 1024 GB ~$940/Month
decasia · 1d ago
To be clear - this isn't an endorsement on my part, just observations of why cloud-only deployment seems common. I guess we shouldn't neglect the pressure towards resume-oriented development either, as it undoubtedly plays a part in infra folks' careers. It probably makes you sound obsolete to be someone who works in a physical data center.
I for one really miss being able to go see the servers that my code runs on. I thought data centers were really interesting places. But I don't see a lot of effort to decide things based on pure dollar cost analysis at this point. There's a lot of other industry forces besides the microeconomics that predetermine people's hosting choices.
gethly · 21h ago
I run on VPSs as well. I ditched cloud a long time ago. Once my project starts making money, I will definitely buy my own hardware and collocate. Cloud is like dating apps. We had fun for a decade but it's time to get serious and get some things actually done and be productive again.
benjiro · 15h ago
> I will definitely buy my own hardware and collocate.
Even colocation is often fraud with issues. I shall not mentioned the plectra of dead hardware from datacenter electricity failures. Ironically, my home has more stable electricity then some datacenters lol.
Unless you running a business where a few minutes downtime will cost you millions, most companies can literally run their own servers from their basements. I often see how much people overestimate their need for 99.999% uptime, or bandwidth requirements.
Its not like colocation is that much cheaper. The electricity prices your paying are often more expensive then even business/home electricity. That leave only internet/fiber, and the pletra of commercial fiber these days.
Used to get minimum quoted price of 2k, for a 1Gbit business fiber years ago (not inc install costs). Now you get in some countries, 5 or 10Gbit for 100 Euro business fiber.
bob1029 · 1d ago
This isn't even the end game for "one big server". AMD will give the most bang per rack, but there are other factors.
An IBM z17 is effectively one big server too, but provides levels of reliability that are simply not available in most IT environments. It won't outperform the AMD rack, but it will definitely keep up for most practical workloads.
If you sit down and really think honestly about the cost of engineering your systems to an equivalent level of reliability, you may find the cost of the IBM stack to be competitive in a surprising number of cases.
dardeaup · 1d ago
At what cost politically? I would expect political battles to be far more intense than any of the technical ones.
sgarland · 1d ago
That’s because 75% (citation: wild-ass estimate) of tech workers are incapable of critical thinking, and blindly parrot whatever they’ve heard / read. The number of times I’ve seen something on HN, thought “that doesn’t sound right,” and then spent a day disproving it locally is too damn high. Of course, by then no one gives a shit, and they’ve all moved on patting each other on the back about how New Shiny is better.
grg0 · 9h ago
I do wish this field were more scientific and factual. Rather, it more closely resembles cults.
dardeaup · 5h ago
I agree. I always cringe when I see a job posting where they're wanting to hire a "passionate" xxx engineer. I always think to myself, "no, you really don't. you want to hire a dispassionate engineer who is objective". It's very difficult to be objective when you're passionate about something (especially a technology). And then what do you do with that passionate person when the organization gets rid of the technology that they're passionate about?
ETA - fixed spelling error
fock · 22h ago
no. In the short time I work at a z/OS-shop, they had to IPL twice. And the IPL takes ages...
Now, if you can live with the weird environment and your people know how to programm what is essentially a distributed system described in terms noone else uses: I guess it's still ok, given the competition is all executing IBMs playbook too.
p_l · 13h ago
Entire mainframe IPL, or just LPAR?
My understanding is that usually you subdivide into few LPARs and then reboot the production ones on schedule to prevent drift and ensure that yes, unplanned IPLs will work
uhura · 12h ago
I’ve been having those discussions with friends for the last 3 or 4 years. The downside of having local infra is pretty much having someone that has the experience to do it right. While this article covered the higher end, the math on the lower end tends to work out at 1 year of ownership depending on what you already have because you will probably already have a small rack and some networking gear.
My main concern is that at the current cloud premiums rates, I will be better off even if I need to hire someone specifically for managing the local infra.
I often wonder if my home NAS/Server would be better off put onto a rented box or a cloud server somewhere, especially since I now have 1gbit/s internet. Even now the 20TB of drive space and 6 Cores with 32GB on Hetzner with a dedicated is about twice the price of buying the hardware over a 5 year period. I suspect the hardware will actually last longer than that and its the same level of redundancy (RAID) on a rented dedicated so the backup is the same cost between the two.
Using cloud and box storage on Hetzner is more expensive than the dedicated server, 4x owning the hardware and paying the power bill. AWS and Azure are just nuts, >100x the price because they charge so much for storage even with hard drives. Contabo nor Netcup can do this, its too much storage for them.
Every time I look at this I come to the same basic conclusion, the overhead of renting someone else’s machine is quite high compared to the hardware and power cost and it would be a worse solution than having that performance on the local network for bandwidth and latency. The problem isn't so much the compute performance, that is relatively fairly priced, its the storage costs and data transfer that bites.
Not really what the article was necessarily about but cloud is sort of meant to be good for low end hardware but its actually kind of not, the storage costs are just too high even a Hetzner Storage box.
Nextgrid · 1d ago
It really depends on your power costs. In certain parts of Europe, power is so expensive that Hetzner actually works out cheaper (despite them providing you the entire machine and datacenter-grade internet connection).
benjiro · 15h ago
Trust me, even with 35 cent/kwh (Germany), its easy to make it work. Just do not buy enterprise hardware. People are obsessed with running racks full of often obsolete hardware, that is not designed around energy efficiency.
Dude is running 12x AMD 6600HS with a power draw between 300 a 400W. The compute alone is easily 3x of a equivalent Hetzner 48c server. We shall not mention the that inc 768GB of memory (people underestimate how much high capacity rdimms draw in power).
The main issue with Hetzner is, as long as you only use their base configuration servers, they are very competitive. The issue is, if you start to step a little bit out of line, the prices simply skyrocket. Try adding more storage to some servers, memory, or you need a higher interconnect between your servers (limited to 10Gbit).
Even basic consumer hardware comes with 2.5Gbit, yet, Hetzner is in the stone ages with 1Gbit. I remember the time when Hetzner introduced 1Gbit. Hetzner was innovation, and progression. But that has been slowly vanishing. Hetzner has been getting more and more lazy. You see the issue also with their cloud offerings storage. Look at Netcup, even Strato etc... They barely introduce anything new anymore, and when something comes its often less competitive or broken. The whole S3 costing Backblaze price levels and non-stop issues.
You can tell they are the only company that every pushed for consumer hardware hosting on mass scale, what made them a small monopoly in the market. And it shows if your a old customer, and know their history. Hey, do people remember the price increases for the auction hardware because of the Ukraine invasion. Do not worry folks, when the electricity prices go down, we will adjust them down. O, we are pre-war prices for almost 2 years. Where is that promised price drops? Hehehe ...
Nextgrid · 8h ago
> Just do not buy enterprise hardware
But then you need to buy newer, more expensive hardware, which pushes your initial price up (divide by the amount of time you'll need to host the server to get the monthly equivalent, then add power/connectivity/maintenance and compare to Hetzner).
Btw generally the reason homelabbers flock to legacy enterprise hardware is that it generally gives you a good amount of compute for a cheap price, if you don't mind the increased power cost. This is actually fine as a lot of homelab usage is bursty and the machine can be powered off outside of those times.
benjiro · 5h ago
Why do you need to buy more expensive hardware? The thing is, if you run a old Xeon server, and compare that performance vs modern consumer level hardware.
Unless you are bought a 50~64 core server (and the power bill to match), your often way cheaper with consumer level hardware. Older server hardware advantage is more on the amount of total memory you can install or the amount of pcie lanes.
The cheapest enterprise CPUs (AMD as example) are currently Zen2, the moment you want Zen3s, the prices go up a lot, for anything 32core or higher.
I have seen so many homelab that ran ancient hardware, only for them to realize that they are able to do the same or more, on modern mini-pcs or similar hardware. Often at the fraction of the power draw.
The reason why so many people loved to run enterprise hardware, was because in the US you had electricity prices in the low single digit or barely in the teens. When you get some 35 cent/kw prices, people tend to do a bit of research and find out its not the 2010's anymore.
I ran multiple enterprise servers, with 384GB memory, things idled at 120W+ (and that was with temp controlled fans because those drain a ton of power). Second PSU? There goes your idle to 150W+.
Ironically, just get a few minipcs and with the same memory capacity spread, your doing 50W (often less) or less. The advantage of using laptop cpus. I have had 8 core Zen3 systems, doing 4.5W in idle.
And yes, you can turn off enterprise hardware but you can also sleep minipcs. And they do not tend to sound like jet engines when waking up ;)
I have a minisforum itx board next to me, with 16c Zen4, cost 380 Euro. Idles at 17W. Beats any 32C Zen2 enterprise server. Even something like a AMD EPYC 7C13 (64C), will be ~40% faster and still costs 600 Euro from China. It will do better on actual multithread workloads where you can really have tons of processes but 400 bucks vs 600+400 motherboard.
Just saying, enterprise has its uses, like in enterprise environments but for most people, especially homelabbers, its often overkill.
bpye · 1d ago
I think I’ve settled on both being the answer - Hetzner is affordable enough that I can have a full backup of my NAS (using ZFS snapshots and incremental backups), and as a bonus can host some services there instead of at home. My home network still has much lower latency and so is preferable for ie. my Lightroom library.
alkonaut · 1d ago
Microservices vs not is (almost) orthogonal to N servers vs one. You can make 10 microservices and rent a huge server and run all 10 services. It's more an organizational thing than a deployment thing. You can't do the opposite though, make a monolith and spread it out on 10 servers.
marcosdumay · 1d ago
> You can't do the opposite though, make a monolith and spread it out on 10 servers.
You absolutely can, and it has been the most common practice for scaling them for decades.
alkonaut · 22h ago
That’s just _duplicating_ the nodes horizontally which wasnt what I meant.
That’s obviously possible snd common.
What I meant was actually butchering the monolith into separate pieces and deploying it, which is - by the definition of monolith - impossible.
doganugurlu · 21h ago
What would be the point of actually butchering the monolith?
There is no limit or cost to deploying 10000 lines over 1000 lines.
alkonaut · 17h ago
I meant in the sense of ”machine A only manages thing authentication” and ”machine B only manages orders”.
If that’s possible (regardless of what was deployed to the two machines) then the app just isn’t a true monolith.
const_cast · 1d ago
> You can't do the opposite though, make a monolith and spread it out on 10 servers.
Yes you can. Its called having multiple applications servers. They all run the same application, just more of them. Maybe they connect to the same DB, maybe not, maybe you shard the DB.
alkonaut · 22h ago
That’s obviously not what I meant. I meant running different aspects of the monolith on different servers.
lelanthran · 21h ago
> I meant running different aspects of the monolith on different servers.
Of course you can. I've done it.
Identical binary on multiple servers with the load balancer/reverse proxy routing specific requests to specific instances.
The practical result is indeed "running different aspects of the monolith on different servers".
sfn42 · 20h ago
That's not a problem in a well designed ASP.NET project. Just create a new web API project, move a controller into it and copy/paste the necessary boilerplate into Program.cs, set up config etc for it and configure cicd to deploy it separately, there you go. Less than a days work.
You can also publish libraries as nuget packages (privately if necessary) to share code across repos if you want the new app in it's own repo.
I've worked on projects with multiple frontends, multiple backbends, lots of separately deployed Azure functions etc, it's no problem at all to make significant structural changes as long as the code isn't a big ball of mud.
I always start with a monolith, we can easily make these changes when necessary. No point complicating things until you actually have a reason to.
johnklos · 1d ago
These days we have more meta-software than software. Instead of Apache with virtualhosts, we have a VM running Docker instances, each with an nginx of its own, all connected by a separate Docker of nginx acting as a proxy.
How much waste is there from all this meta-software?
In reality, I host more on Raspberry Pis with USB SSDs than some people host on hundred-plus watt Dells.
At the same time, people constantly compare colo and hardware costs with the cost per month of cloud and say cloud is "cheaper". I don't even bother to point out the broken thinking that leads to that. In reality, we can ignore gatekeepers and run things out of our homes, using VPSes for public IPs when our home ISPs won't allow certain services, and we can still have excellent uptimes, often better than cloud uptimes.
Yes, we can consolidate many, many services in to one machine because most services aren't resource heavy constantly.
Two machines on two different home ISP networks backing each other up can offer greater aggregate uptime than a single "enterprise" (a misnomer, if you ask me, if you're talking about most x86 vendors) server in colo. A single five minute reboot of a Dell a year drops uptime from 100% to 99.999%.
Cloud is such bullshit that it's exhausting even just engaging with people who "but what if" everything, showing they've never even thought about it for more than a minute themselves.
Right now, my plan is to move from a bunch of separate VPSes, to one dedicated server from Hetzner and run a few VMs inside of it with separate public IPs assigned to them alongside some resource limits. You can get them for pretty affordable prices, if you don't need the latest hardware: https://www.hetzner.com/sb/
That way I can limit the blast range if I mess things up inside of a VM, but at the same time benefit from an otherwise pretty simple setup for hosting personal stuff, a CPU with 8 threads and 64 GB of RAM ought to be enough for most stuff I might want to do.
ehnto · 18h ago
That's the worst part of stringing a bunch of cloud together. Auth, keys, config, credentials expiring, logging back into everything all day. It smooths out the brain.
Give me a box, trust me with ssh keys and things are so much easier. Simple is good for the soul and the wallet.
ahdanggit · 10h ago
I used a colo once a few years ago at a small datacenter in the midwest, I was shocked at how unprofessional everything was, machines laying in the hallway, a guy was sleeping in one of the offices. They let me setup my server and was left unattended several times, I could have just poked the power button on a nearby server or moved a cable or whatever. It was a 1.5 hour drive away, and I wasn't running anything serious so I just went with it but pulled my stuff out after my 1 year subscription was up.
joshmn · 1d ago
I did this (well, a large-r VPS for $120/month) for my Rails-based sports streaming website. I had a significant amount of throughput too, especially at peak (6-10pm ET).
My biggest takeaway was to have my core database tables (user, subscription, etc) backed up every 10 minutes, and the rest every hour, and test their restoration. (When I shut down the site it was 1.2TB.) Having a script to quickly provision a new node—in case I ever needed it—would have something up within 8 minutes from hitting enter.
When I compare this to the startups I’ve consulted for, who choose k8s because it’s what Google uses yet they only push out 1000s of database queries per day with a handful of background jobs and still try to optimize burn, I shake my head.
I’d do it again. Like many of us I don’t have the need for higher-complexity setups. When I did need to scale, I just added more vCPUs and RAM.
vevoe · 1d ago
Is there somewhere I can read more about your setup/experience with your streaming site? I currently run a (legal :) streaming site but have it hosted on AWS and have been exploring moving everything over to a big server. At this point it just seems like more work to move it than to just pay the cloud tax.
joshmn · 1d ago
Do a search for HeheStreams on your favorite search engine.
The technical bits aren’t all there, though, and there’s a plethora of noise and misinformation. Happy to talk via email though.
vevoe · 1d ago
Will do, thank you!
simonw · 1d ago
This was written in 2022, but looks
like it's most still relevant today. Would be interesting to see updated numbers on the expected costs of various hosting providers.
Being a big server proponent myself. Usually for one reason or the other there is need to introduce some socket style communication to the frontend and that becomes impossible in a single machine after a certain threshold.
Is there something obvious that I'm missing?
winrid · 12h ago
I've had 100k+ users connected to mid range linode boxes. Do you have that many?
Even still at that point you just round robin to a set of big machines. Easy
MitPitt · 17h ago
Load Balancing, Redundancy and Fail-Over
api · 1d ago
I’ve found that it’s hard to even hire engineers who aren’t all in on cloud and who even know how to build without it.
Even the ones who do know have been conditioned to tremble with fear at the thought of administrating things like a database or storage. These are people who can code cryptography kernels and network protocols and kernel modules, but the thought of running a K8S cluster or Postgres fills them with terror.
“But what if we have downtime!”
That would be a good argument if the cloud didn’t have downtime, but it does. Most of our downtime in previous years has been the cloud, not us.
“What if we have to scale!” If we are big enough to outgrow a 256 core database with terabytes of SSD, we can afford to hire a full time DBA or two and have them babysit a cluster. It’ll still be cheaper.
“What if we lose data?” Ever heard of backups? Streaming backups? Hot spares? Multiple concurrent backup systems? None of this is complex.
“But admin is hard!” So is administrating cloud. I’ve seen the horror of Terraform and Helm and all that shit. Cloud doesn’t make admin easy, just different. It promised simplicity and did not deliver.
… and so on.
So we pay about 1000X what we should pay for hosting.
Every time I look at the numbers I curse myself for letting the camel get its nose under the tent.
If I had it to do over again I’d forbid use of big cloud from day one, no exceptions, no argument, use it and you’re fired. Put it in the articles of incorporation and bylaws.
matt-p · 1d ago
I have also found this happening. It's actually really funny because I think even I'm less inclined to run postgres myself these days, when I used to run literally hundreds of instances with not much more than PG_DUMP, cron and two read only replicas.
These days probably the best way of getting these 'cloudy' engineers on board is just to tell them its Kubernetes and run all of your servers as K3s.
api · 1d ago
I’m convinced that cloud companies have been intentionally shaping dev culture. Microservices in particular seem like a pattern designed to push managed cloud lock in. It’s not that you have to have cloud to use them, but it creates a lot of opportunities to reach for managed services like event queues to replace what used to be a simple function call or queue.
Dev culture is totally fad driven and devs are sheep, so this works.
matt-p · 1d ago
Yeah I think that's fair. I'm very pro containers though, that's a genuine step forward from deploy scrips or vm images.
turtlebits · 1d ago
The problem is sizing and consistency. When you're small, it's not cost effective to overprovision 2-3 big servers (for HA).
And when you need to move fast (or things break), you can't wait a day for a dedicated server to come up, or worse, have your provider run out of capacity (or have to pick a different specced server)
IME, having to go multi cloud/provider is a way worse problem to have.
andersmurphy · 1d ago
Most industries are not bursty. Overprovision in not expensive for most businesses. You can handle 30000+ updates a second on a 15$ VPS.
A multi node system tends to be less reliable and more failure points than a single box system. Failures rarely happen in isolation.
You can do zero downtime deployment with a single machine if you need to.
Aeolun · 17h ago
> A multi node system tends to be less reliable and more failure points than a single box system. Failures rarely happen in isolation.
Just like a lot of problems exists between keyboard and chair, a lot of problems exist between service A and service B.
The zero downtime deployment for my PHP site consisted of symlinking from one directory to another.
andersmurphy · 14h ago
Nice!
Honestly, we need to stop promoting prematurely making everything a network request as a good idea.
Nextgrid · 7h ago
> we need to stop promoting prematurely making everything a network request as a good idea
But how are all these "distributed systems engineers" going to get their resume points and jobs?
matt-p · 1d ago
There are a number of providers who provision dedicated servers via API in minutes these days. Given a dedicated server starts at around $90/Month it probably does make sense for alot of people.
winrid · 12h ago
A $20 dedicated server from OVH can outperform $144 VPSs from Linode in my testing, on passmark.
talles · 1d ago
Don't forget the cost of managing your one big server and the risk of having such single point of failure.
Puts · 1d ago
My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures. One server is the simplest and most reliable setup, and if you have backup and automated provisioning you can just re-deploy your entire environment in less than the time it takes to debug a complex multi-server setup.
I'm not saying everybody should do this. There are of-course a lot of services that can't afford even a minute of downtime. But there is also a lot of companies that would benefit from a simpler setup.
sgarland · 1d ago
Yep. I know people will say, “it’s just a homelab,” but hear me out: I’ve ran positively ancient Dell R620s in a Proxmox cluster for years. At least five. Other than moving them from TX to NC, the cluster has had 100% uptime. When I’ve needed to do maintenance, I drop one at a time, and it maintains quorum, as expected. I’ll reiterate that this is on circa-2012 hardware.
In all those years, I’ve had precisely one actual hardware failure: a PSU went out. They’re redundant, so nothing happened, and I replaced it.
Servers are remarkably resilient.
EDIT: 100% uptime modulo power failure. I have a rack UPS, and a generator, but once I discovered the hard way that the UPS batteries couldn’t hold a charge long enough to keep the rack up while I brought the generator online.
whartung · 1d ago
Being as I love minor disaster anecdotes where doing all the "right things" seem to not make any difference :).
We had a rack in data center, and we wanted to put local UPS on critical machines in the rack.
But the data center went on and on about their awesome power grid (shared with a fire station, so no administrative power loss), on site generators, etc., and wouldn't let us.
Sure enough, one day the entire rack went dark.
It was the power strip on the data centers rack that failed. All the backups grids in the world can't get through a dead power strip.
(FYI, family member lost their home due to a power strip, so, again, anecdotally, if you have any older power strips (5-7+ years) sitting under your desk at home, you may want to consider swapping it out for a new one.)
sgarland · 14h ago
For sure, things can and will go wrong. For critical services, I’d want to split them up into separate racks for precisely that reason.
Re: power strips, thanks for the reminder. I’m usually diligent about that, but forgot about one my wife uses. Replacement coming today.
motorest · 1d ago
> My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures.
I think you misread OP. "Single point of failure" doesn't mean the only failure modes are hardware failures. It means that if something happens to your nodes whether it's hardware failure or power outage or someone stumbling on your power/network cable, or even having a single service crashing, this means you have a major outage on your hands.
These types of outages are trivially avoided with a basic understanding of well-architected frameworks, which explicitly address the risk represented by single points of failure.
fogx · 1d ago
don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner?
and even if, you could just run a provisioned secondary server that jumps in if the first becomes unavailable and still be much cheaper.
motorest · 23h ago
> don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner?
You're not getting the point. The point is that if you use a single node to host your whole web app, you are creating a system where many failure modes, which otherwise could not even be an issue, can easily trigger high-severity outages.
> and even if, you could just run a provisioned secondary server (...)
Congratulations, you are no longer using "one big server", thus defeating the whole purpose behind this approach and learning the lesson that everyone doing cloud engineering work is already well aware.
juped · 20h ago
Do you actually think dead simple failover is comparable to elastic kubernetes whatever?
motorest · 18h ago
> Do you actually think dead simple failover is comparable to elastic kubernetes whatever?
References to "elastic Kubernetes whatever" is a red herring. You can have a dead simple load balancer spreading traffic across multiple bare metal nodes.
juped · 16h ago
Thanks for switching sides to oppose yourself, I guess?
toast0 · 1d ago
I don't know about Hetzner, but the failure case isn't usually tripping over power plugs. It's putting a longer server in the rack above/below yours and pushing the power plug out of the back of your server.
Either way, stuff happens, figuring out what your actual requirements around uptime, time to response, and time to resolution is important before you build a nine nines solution when eight eights is sufficient. :p
kapone · 19h ago
> It's putting a longer server in the rack above/below yours and pushing the power plug out of the back of your server
Are you serious? Have you ever built/operated/wired rack scale equipment? You think the power cables for your "short" server (vs the longer one being put in) are just hanging out in the back of the rack?
Rack wiring has been done and done correctly for ages. Power cables on one side (if possible), data and other cables on the other side. These are all routed vertically and horizontally, so they land only on YOUR server.
You could put a Mercedes Maybach above/below your server and nothing would happen.
toast0 · 11h ago
Yes I'm serious. My managed host took several of our machines offline when racking machines under/over ours. And they said it was because the new machines were longer and knocked out the power cables on ours.
We were their largest customer and they seemed honest even when they made mistakes that seemed silly, so we rolled our eyes and moved on with life.
Managed hosting means accepting that you can't inspect the racks and chide people for not cabling to your satisfaction. And mistakes by the managed host will impact your availability.
icedchai · 1d ago
It's unlikely, but it happens. In the mid 2000's I had some servers at a colo. They were doing electrical work and took out power to a bunch of racks, including ours. Those environments are not static.
Aeolun · 17h ago
In my experience, my personal services have gone down exactly zero times. Actually not entirely true, but every time they stopped working the servers had simply run out of disk space.
The number of production incidents on our corporate mishmash of lambda, ecs, rds, fargate, ec2, eks etc? It’s a good week when something doesn’t go wrong. Somehow the logging setup is better on the personal stuff too.
ocdtrekkie · 1d ago
My single on-premise Exchange server is drastically more reliable than Microsoft's massive globally resilient whatever Exchange Online, and it costs me a couple hours of work on occasion. I probably have half their downtime, and most of mine is scheduled when nobody needs the server anyhow.
I'm not a better engineer, I just have drastically fewer failure modes.
talles · 1d ago
Do you develop and manage the server alone? It's a quite a different reality when you have a big team.
ocdtrekkie · 1d ago
Mostly myself but I am able to grab a few additional resources when needed. (Server migration is still, in fact, not fun!)
talles · 1d ago
I also have seem the opposite somewhat frenquently: some team screws up the server and unrelated stable services that are running since forever (on the same server) are now affected due messing up the environment.
api · 1d ago
A lot of this attitude comes from the bad old days of 90s and early 2000s spinning disk. Those things failed a lot. It made everyone think you are going to have constant outages if you don’t cluster everything.
Today’s systems don’t fail nearly as often if you use high quality stuff and don’t beat the absolute hell out of SSD. Another trick is to overprovision SSD to allow wear leveling to work better and reduce overall write load.
Do that and a typical box will run years and years with no issues.
jeffrallen · 1d ago
Not to mention the other leading cause of outages: UPS's.
Sigh.
icedchai · 1d ago
UPSes always seem to have strange failure modes. I've had a couple fail after a power failure. The batteries died and they wouldn't come back up automatically when the power came back. They didn't warn me about the dead battery until after...
sgarland · 1d ago
That’s why they have self-tests. Learned that one the hard way myself.
icedchai · 15h ago
My UPS was supposedly "self testing" itself periodically and it still happened!
sgarland · 14h ago
Oof, sorry.
ies7 · 1d ago
The last 4-5 years taught me that my most often single point of failure where I can't do a thing is Cloudflare not my on premise servers
> Don't forget the cost of managing your one big server
Is that more, less than or about the same as having an AWS/Azure/GCP consultant?
What's the difference in labour per hour?
> the risk of having such single point of failure.
At the prices they charge I can have two hot failovers in two other datacenter and still come out ahead.
wmf · 1d ago
Don't forget to read the article.
chrisweekly · 1d ago
I'll take a (lone) single point of failure over (multiple) single points of failure.
justmarc · 1d ago
AWS has also been a single point of failure multiple times in history, and there's no reason to believe this will never happen again.
juped · 20h ago
The predictable cost, you mean, making business planning way easier? And you usually have two, because sometimes kernels do panic or whatever.
jiggawatts · 1d ago
I'm in the process of breaking up a legacy deployment on "one big server" into something cloud native like Kubernetes.
The problem with one big server is that few customers have ONE (1) app that needs that much capacity. They have many small apps that add up to that much capacity, but that's a very different scenario with different problems and solutions.
For example, one of the big servers I'm in the process of teasing apart has about 100 distinct code bases deployed to it, written by dozens of developers over decades.
If any one of those apps gets hacked and this is escalated to a server takeover, the other 99 apps get hacked too. Some of those apps deal with PII or transfer money!
Because a single big server uses a single shared IP address for outbound comms[1] this means that the firewall rules for 100 apps end up looking like "ALLOW: ANY -> ANY" for two dozen protocols.
Because upgrading anything system-wide on the One Big Server is a massive Big Bang Change, nobody has had the bravery to put their hand up and volunteer for this task. Hence it has been kept alive running 13 year old platform components because 2 or 3 of the 100 apps might need some of those components... but nobody knows which two or three apps those are, because testing this is also big-bang and would need all 100 apps tested all at once.
It actually turned out that even Two Big (old) Servers in a HA pair aren't quite enough to run all of the apps so they're being migrated to newer and better Azure VMs.
During the interim migration phase instead of Two Big Server s there are Four Big Servers... in PRD. And then four more in TST, etc... Each time a SysOps person deploys a new server somewhere, they have to go tell each of the dozens of developers where they need to deploy their apps today.
Don't think DevOps automation will rescue you from this problem! For example in Azure DevOps those 100 apps have 100 projects. Each project has 3 environments (=300 total) and each of those would need a DevOps Agent VM link to the 2x VMs = 600 VM registrations to keep up to date. These also expire every 6 months!
Kubernetes, Azure App Service, AWS App Runner, and GCP App Engine serve a purpose: They solve these problems.
They provide developers with a single stable "place" to dump their code even if the underlying compute is scaled, rebuilt, or upgraded.
They isolate tiny little apps but also allow the compute to be shared for efficient hosting.
They provide per-app networking and firewall rules.
Etc...
[1] It's easy to bind distinct ingress IP addresses on even a single NIC (or multipe), but it's weirdly difficult to split the outbound path. Maybe this is easier on Linux, but on Windows and IIS it is essentially impossible.
mystifyingpoi · 21h ago
Finally, someone said it.
> 100 distinct code bases deployed to it
I've worked in a company, where the owner would spend money on anything except hosting. Admin guy would end up deploying a new app on whatever VPS that had the most RAM free at that time.
Ironically, consolidating this mess to "one big server", which was my ungrateful job for many months, fixed many issues. Though, it was done by slicing the host into tiny KVM virtual machines.
jiggawatts · 19h ago
> slicing the host into tiny KVM virtual machines.
That's my other option: a bunch of Azure VM Scale Sets using the tiniest size that will run Windows Server, such as B2as_v2. A handful of closely related apps on each so that firewall rules can be restricted to something sane. Shared Azure Files for the actual app deployments so that devs never need to know the VM names. However, this feels an awful lot like reinventing Kubernetes... but worse.
My job would be sooo much simpler if Microsoft just got off their high horse and supported their own Active Directory in App Service instead of pretending it no longer exists.
SatvikBeri · 1d ago
A lot of these articles look at on-demand pricing for AWS. But you're rarely paying on-demand prices 24/7. If you have a stable workload, you probably buy reserved instances or a compute savings plan. At larger scales, you use third party services to get better deals with more flexibility.
A while back I looked into renting hardware, and found that we would save about 20% compared to what we actually paid AWS – in partially because location and RAM requirements made the rental more expensive than anticipated, and partially because we were paying a lot less than on-demand price for AWS.
20% is still significant, but it's a lot less than the ~80% that this and other articles suggest.
vidarh · 1d ago
This is usually only true of you lift and shift your AWS setup exactly as-is, instead of looking at what hardware will run your setup most efficiently.
The biggest cost with AWS also isn't compute, but egress - for bandwidth heavy setups you can sometimes finance the entirety of the servers from a fraction of the savings in egress.
I cost optimize setups with guaranteed caps at a proportion of savings a lot of the time, and I've yet to see a setup where we couldn't cut the cost far more than that.
SatvikBeri · 12h ago
I'd definitely be curious to hear how you'd approach our overall situation. We don't have significant egress costs, nor has any place I've worked with before. Our AWS costs are about 80% EC2 and Fargate, with the rest scattered over various services. Roughly half our spend is on 24/7 reserved instances, while the other half is in bursty analytics workloads.
Our workloads are primarily memory-bound, and AWS offers pretty good options there, e.g. x2gd instances have 16gb RAM/cpu, while most rental options we found were much more CPU focused (and charged for it.)
Nextgrid · 7h ago
> while most rental options we found were much more CPU focused
Out of curiosity have you benchmarked it? I find that AWS "vCPUs" are significantly slower than a core (or even hyperthread) of a real CPU, and this constrains memory bandwidth too. A single bare-metal can often replace many EC2s.
Another thing to consider is the easy access of persistent NVME drives, something not possible on AWS. Yes you still need backups, but ideally you will only need those backups once a year or less. I've dealt with extremely complex and expensive solutions on AWS that could be trivially solved by just one persistent machine with NVME drives (+ a spare for redundancy). Having the data there persistently (at a cheap price per GB) means you avoid having to shuffle data around or can precompute derived data to speed up lookups at runtime.
If you're actually serious about exploring options to move your infra to bare-metal or hybrid feel free to reach out for a no-obligations call; email in my profile. It seems like you've already optimized it quite well so I'd be curious to see if there is still room for improvement. (Or if you don’t mind, share what your stack is and let others chip in too!)
Havoc · 1d ago
>Unfortunately, since all of your services run on servers (whether you like it or not), someone in that supply chain is charging you based on their peak load.
This seems fundamentally incorrect to me? If I need 100 units of peak compute during 8 hours of work hours, I get that from Big Cloud, and they have two other clients needing same in offset timezones then in theory the aggregate cost of that is 1/3rd of everyone buying their own peak needs.
Whether big cloud passes on that saving is another matter, but it's there.
i.e. big cloud throws enough small customers together so that they don't have "peak" per se just a pretty noisy average load that is in aggregate mostly stable
vidarh · 1d ago
But they generally don't. Most people don't have large enough daily fluctuations for these demand curves to flatten out enough. And the providers also need enough capacity to handle unforeseen spikes. Which is also why none of them will let you scale however far you want - they still impose limits so they can plan the excess they need.
Havoc · 15h ago
> And the providers also need enough capacity to handle unforeseen spikes.
Indeed but the headroom the cloud needs overall is less than every customers individual worst case scenarios added up. They’d take a percentage of that total because statistically a situation where 100% of customers are at 100% of their peak at 100% same point in time is improbable
Must admit little surprised this logic isn’t self evident
namibj · 15h ago
In which cloud can I book a machine with a guaranteed (up to general uptime SLA) end/termination time that's fixed for both?
qaq · 1d ago
and now consider 6th Gen EPYC will have 256 cores also you can have
32 hot-swap SSDs with like 10mil plus of random write IOPS and 60mil plus random read IOPS in a single 2U box
The one big box assumes that you know how to configure everything for high performance. I suspect that skill has been lost, for the most part.
You really need to tweak the TCP/IP stack, buffer sizes, and various other things to get everything to work really well under heavy load. I'm not sure if the various sites that used to talk about this have been updated in the last decade or so, because I don't do that anymore.
I mean, you'll run out of file descriptors pretty quickly if you try to handle a few hundred simultaneous connections. Doesn't matter how big your box is at that point.
jeffrallen · 1d ago
I work for a cloud provider and I'll tell you, one of the reasons for the cloud premium is that it is a total pain in the ass to run hardware. Last week I installed two servers and between them had four mysterious problems that had to be solved by reseating cards, messing with BIOS settings, etc. Last year we had to deal with a 7 site, 5 country RMA for 150 100gb copper cables with incorrect coding in their EEPROMs.
I tell my colleagues: it's a good thing that hardware sucks: the harder it is to run bare metal, the happier our customers are that they choose the cloud. :)
(But also: this is an excellent article, full of excellent facts. Luckily, my customers choose differently.)
Nextgrid · 1d ago
Fortunately, companies like Hetzner/OVH/etc will handle all this bullshit for you for a flat monthly fee.
0xbadcafebee · 11h ago
Ah, the folksy wisdom of the armchair. Sounds convincing, doesn't it? I mean, it includes math! And prices! The quoted prices are more expensive for the cloud. And he makes folksy claims that make sense, like the "fragile complexity" of having "more than one computer". It makes sense! Right??
But is he right? How do we know? Well for starters, look at his CV. He has never managed servers for a living. The closest he's come is working on FPGAs. So what's he basing all these opinions on? Musings? Thoughts? Feelings? Hope?
He makes a couple claims which it isn't obvious are bunk, so I'll address them here, in reverse order.
"microservice architectures in general add a lot of overhead to a system for dubious gain when you are running on one big server" - Microservices architectures are not about overhead or efficiency. They are an attempt to use good software design principles to address Conway's Law. If you design the microservice correctly, you can enable many different groups in an organization to develop software independently, and come up with a highly effective and flexible organization and stable products. Proof? Amazon. But the caveat is, you have to design them correctly. Almost everyone fails at this.
"It's impossible to get the benefits of a CDN, both in latency improvements and bandwidth savings, with one big server" - This is so dumb I'm not sure I have to refute it? But, uh, no, CDNs absolutely give a heap of benefits whether you have 1 server or 1,000. And CloudFlare Free Plan is Free.
"My Workload is Really Bursty - Cloud away." - Unless your workload involves massive amounts of storage or ingress/egress and your profit margin tiny, in which case you may save more by building out a small fleet of unreliable poorly-maintained colocated servers (emphasis on may).
"The "high availability" architectures you get from using cloudy constructs and microservices just about make up for the fragility they add due to complexity. ... Remember that we are trying to prevent correlated failures. Cloud datacenters have a lot of parts that can fail in correlated ways. Hosting providers have many fewer of these parts. Similarly, complex cloud services, like managed databases, have more failure modes than simple ones (VMs)." - Argument from laziness, or ignorance? He's trying to say that because something is complex it's also less reliable. Which completely ignores the reliability engineering aspect of that complexity. You mitigate higher numbers of failure modes by designing the system to fail over reliably. And you also have warm bodies running around replacing the failing parts, which fights entropy. You don't get that in a single server; once your power supply, disk, motherboard, network interface, RAM, etc fails, and assuming your server has a redundant pair, you have a ticking clock to repair it until the redundant pair fails. How lucky do you feel? (oh, and you'll need downtime to repair it.)
As usual, the cloud costs quoted is MSRP, and if you're paying retail, you're a fool. Almost all cloud costs can be brought down from 25%-75%, spot instances are a fraction of the on-demand server cost, and efficient use of cheaper cloud services reduces your need to buy compute at all.
"The big drawback of using a single big server is availability. Your server is going to need downtime, and it is going to break. Running a primary and a backup server is usually enough, keeping them in different datacenters. A 2x2 configuration should appease the truly paranoid: two servers in a primary datacenter (or cloud provider) and two servers in a backup datacenter will give you a lot of redundancy. If you want a third backup deployment, you can often make that smaller than your primary and secondary." - Wait... so One Big Server isn't enough? Huh. So this was a clickbait article? I'm shocked!
"One Server (Plus a Backup) is Usually Plenty" - Plenty for what? I mean we haven't even talked system architecture or application design. But let's assume it's a single microservice that gets 1RPS. Is your backup server a hot spare, cold spare, or live mirror? If it's live, it's experiencing the same wear, meaning it will fail at about the same time. If it's hot, there's less wear, but it's still experiencing some. If it's cold, you get less wear, but you're less sure it'll boot up again. And then there's system configuration. The author mentions the "complexity" of managing a cluster, but actually it's less complex than managing just two servers. With a fleet of servers, you know you have to use automation, so you spend the time to automate their setup and run updates frequently. With a backup, you probably won't do any maintenance on the backup, and you definitely won't perform the same operations on the backup as the server. So the system state will drift wildly, and the backup's software will be useless. It would be better to just have it as spare part.
The author never talks about the true failure modes of "one big server". When parts start to need replacing, it's never cheap. Smart hands cost, cost of the parts+shipping, cost of the downtime. And often you'll find there are delays - delays in getting smart hands to actually repair it correctly, delays in shipping, delays in part ordering/availability. Running out of power, running out of space, temperatures too high, "flaky" parts you can't diagnose, backups and restores, datacenter issues, routing issues, backbone issues. You'll tell yourself these are "probably rare" - but these are all failure modes, and as the author tells us, you should be wary of lots of failure modes. And anecdotes will tell you somebody has run a server for 10 years with no issue, while another person had a server with 3 faults in a month. To say nothing of the need to run "burn-in" on a new server to discover faults once it's racked.
Go ahead and do whatever you want. Cloud, colo, one server, multiple. There will be failures and complexity no matter what. You want to tell yourself a comforting story that there is "one piece of advice" to follow, some black and white world where only one piece of folksy wisdom applies. But here's my folksy wisdom: design your application, design your system to fit it, try not to pinch every penny, build something, and become educated enough to know what problems to expect and how to deal with them. Or if not, pay someone who can, and listen to them.
garganzol · 1d ago
And then boom, all your services are gone due to a pesky capacitor on the motherboard. Also good luck trying to change even one software component of that monolith without disrupting and jeopardizing the whole operation.
While it is a useful advice to some people in certain conditions, it should be taken with a grain of salt.
fragmede · 1d ago
That capacitor thing hasn't been true since the 90's.
icedchai · 1d ago
Capacitor problem or not, hardware does fail. Power supplies crap out. SSDs die in strange ways. A failure of a supposedly "redundant" SSD might cause your system to freeze up.
mannyv · 22h ago
One thing that we ran into back in the day was EEC failure on reboot.
We had a few Dell servers that ran great for a year or two. We rebooted one for some reason or another and it refused to POST due to an EEC failure.
Hauled down to the colo at 3AM and ripped the fucking ram out of the box and hoped it would restart.
Hardware fails. The RAM was fine for years, but something happened to it. Even Dell had no idea and just shipped us another stick, which we stuck in at the next downtime window.
To top it off, we dropped the failing RAM into another box at the office and it worked fine. <shrug>.
garganzol · 1d ago
Hardware still fails. It isn't a question of "if", it's a question of "when". Nothing lasts forever, the naivety lasts only so long too.
fragmede · 21h ago
Obviously. But you get duplicate hardware,
set up HA, get vendor support contracts, use multiple colors in disparit location. Cloud providers have figured this out fairly well, as we all did in the aughts. (Well, some of us anyway.) You can definitely determine that's a bunch of really annoying work and just pay a cloud provider to deal with it, or not, and go your own way. But if you want to be credible when saying that hardware fails, maybe people shouldn't use a problem from three decades ago as their example and use anything more recent?
garganzol · 17h ago
When you get duplicate hardware, it is not "One Big Server" anymore. "Two Big Servers" at least. In September 2015, the failure rate caused by capacitors was still around 30% [1].
> Part of the "cloud premium" for load balancers, serverless computing, and small VMs is based on how much extra capacity your cloud provider needs to build in order to handle their peak load. You're paying for someone's peak load anyway!
Eh, sort of. The difference is that the cloud can go find other workloads to fill the trough from off peak load. They won’t pay as much as peak load does, but it helps offset the cost of maintaining peak capacity. Your personal big server likely can’t find paying workloads for your troughs.
I also have recently come to the opposite conclusion for my personal home setup. I run a number of services on my home network (media streaming, email, a few personal websites and games I have written, my frigate NVR, etc). I had been thinking about building out a big server for expansion, but after looking into the costs I bought 3 mini pcs instead. They are remarkably powerful for their cost and size, and I am able to spread them around my house to minimize footprint and heat. I just added them all to my home Kubernetes cluster, and now I have capacity and the ability to take nodes down for maintenance and updates. I don’t have to worry about hardware failures as much. I don’t have a giant server heating up one part of my house.
It has been great.
randomtoast · 1d ago
Those servers are mainly designed for enterprise use cases. For hobby projects, I can understand why someone would choose Hetzner over AWS.
For enterprise environments, however, there is much more to consider. One of the biggest costs you face is your operations team. If you go with Hetzner, you essentially have to rebuild a wide range of infrastructure components yourself (WAF, globally distributed CDN, EFS, RDS, EKS, Transit Gateways, Direct Connect and more).
Of course, you can create your own solutions for all of these. At my company, a mid-size enterprise, we once tried to do exactly that.
and 20+ more moving targets of infra software stack and support systems
The result was hiring more than 10 freelancers in addition to 5 of our DevOps engineers to build it all and handling the complexity of such a setup and the keep everything up-to-date, spending hundreds of thousands of dollars. Meanwhile, our AWS team, consisting of only three people working with Terraform, proved far more cost-effective. Not in terms of dollars per CPU core, but in terms of average per project spending dollars once staff costs and everything were included.
I think many of the HN posts that say things like "I saved 90% of my infra bill by moving from AWS to a single Hetzner server" are a bit misleading.
andersmurphy · 1d ago
Most of those things you listed are work arounds for having a slow server/system.
For example, if you serve your assets from the server you can skip a cors round trip. If you use an embedded database like sqlite you can shave off 50ms, use dedicated CPU (another 50ms), now you don't need to sever anything from the edge. Because your global latency is much better.
Picking an arbitrary price point of $200/mo, you can get 4(!) vCPUs and 16GB of RAM at AWS. Architectures are different etc., but this is roughly a mid-spec dev laptop of 5 or so years ago.
At Hetzner, you can rent a machine with 48 cores and 128GB of RAM for the same money. It's hard to overstate how far apart these machines are in raw computational capacity.
There are approaches to problems that make sense with 10x the capacity that don't make sense on the much smaller node. Critically, those approaches can sometimes save engineering time that would otherwise go into building a more complex system to manage around artificial constraints.
Yes, there are other factors like durability etc. that need to be designed for. But going the other way, dedicated boxes can deliver more consistent performance without worries of noisy neighbors.
If your needs go beyond that? Then you need real computers with real configuration and you have OVH/Hetzner/Latitude who will rent you MONSTER machines for the cost of some cheap-ass surplus 2017 Intel on The Cloud.
And if you just want a blog or whatever? Zillion VPS options.
The traditional cloud is for regulatory/process/corruption capture extraction in 2025: its machine economics and developer productivity use case is fucking zero I've seen. Maybe there's some edge case where a completely unencumbered team is better off with DMV trip permissions theatre, remnant Intel racked with noisy neighbors at massive markup, and no support recourse.
(2) What do you do if your large Hetzner server starts to show signs of malfunction? How soon would you be able to replace it, and how easily?
(2a) What do you do when your large Hetzner server just dies? I see that this happens rarely, but what's your contingency plan, if any?
(3) What do you do when your load is highly spiky? Do you reserve bare metal capacity for the biggest peak you expect to serve, because it's so much cheaper than running an elastic serverless architecture of the same capacity anyway?
(4) Considering that your stack still includes many components, how do you manage them, and how expensive is the management overhead? Do you need an extra SRE?
These are not rhetorical questions; I'd love to hear firm real practitioners! (E.g. Stack Overflow used to do deep dives into their few-big-servers architecture.)
A key factor underlining all of this is understanding, from a business/organizational perspective, your actual uptime requirements. Google may aim at 5 nines with the budget to achieve it, but many banks have routine planned downtime. If you don't know your objectives, you will have trouble making the tradeoffs necessary to get there. As a hypothetical, would your business choose 99.999% uptime (26 seconds down on average per month) vs 99.99% (4.3 min) if that caused infra costs to rise by 50% or more? If you said we can cut our infra costs by 50% by planning a short weekly maintenance window, how would that resonate?
Speaking to a few, in my experience:
2) (not at Hetzner specifically, but at a dedicated host). You have backups & recovery plans, and redundancy where it makes sense. You might run your database with a replica. If you are serving Web traffic, maybe you keep a hot spare. Also, you are still allowed to use e.g. cloud services if it makes sense to do so so you can backup to S3 and use things like SQS or KMS if you don't want to run them yourself. It's worth noting that you may not get advance notice; I recall our service being impacted by a fire at a datacenter that IIRC was caused by a traffic accident on a nearby highway. The point is you have to design resilience into the system. Fortunately, this is well-trod ground.
It would not be a terrible failover option to have something like an autoscale group at AWS ready to step in if the dedicated cluster goes offline. Keep that cluster scaled to 0 until it's needed. Put the cloud behind your cheap dedicated capacity.
3) See above. In my case, we over-provisioned because it's cheap to do so. I did not do this at the time, but I would probably look at running a replicated database with a hot standby on another server.
4) It has not been my experience that "modern" cloud deployments require fewer SRE resources. Like water running downhill, cloud projects seek complexity.
No network latency between nodes, less memory bandwidth latency/contention as there is in VMs, no caching architecture latency needed when you can just tell e.g. Postgres to use gigs of RAM and then let Linux's disk caching take care of the rest (and not need a separate caching architecture).
If you’re running Postgres locally you can turn off the TCP/IP part; nothing more to audit there.
SSH based copying of backups to a remote server is simple.
If not accessible via network, you can stay on whatever version of Postgres you want.
I’ve heard these arguments since AWS launched, and all that time I’ve been running Postgres (since 2004 actually) and have never encountered all these phantom issues that are claimed as being expensive or extremely difficult.
It gets even easier now that you have cheap s3 - just upload the dump to s3 every day and set the s3 deletion policy to whatever is feasible for you.
For backups, including Postgres, I was planning on paying Veeam ~$500 a year for a software license to backup the active node and Postgres database to s3/r2. Standby node would be getting streaming updates via logical replication.
There are free options as well but I didn’t want to cheap out on the backups.
It looks pretty turnkey. I am a software engineer not a sysadmin though. Still just theory as well as I haven’t built it out yet
Either way: 1 day of a mid-level developer in the majority of the world (basically: anywhere except Zurich, NYC or SF) is between €208 and €291. (Yearly salary of €50-€70k)
A junior developer's time for setup and the cost of hardware is practically a one-off expense. It's a few days of work at most.
The alternative you're advocating for (a recurring SaaS fee) is a permanent rent trap. That money is gone forever, with no asset or investment to show for it. Over a few years, you'll have spent tens of thousands of dollars for nothing. The real cost is not what you pay a developer; it's what you lose by never owning your tools.
Not sure where I advocated for that. Could you point it out please?
(what's "medium-size corp" and how did you come up with $100k ?)
RDS has a value. But for many teams the price paid for this value is ridiculously high when compared to other options.
[0] A normal sysadmin remains vaguely bemused at their job title and the way it changes every couple years.
Sometimes even the certified cloud engineers can't tell you why an RDS behaves the way it does, nor can they really fix it. Sometimes you really do need a DBA, but that applies equally to on-prem and cloud.
I'm a sysadmin, but have been labelled and sold as: Consultant (sounds expensive), DevOps engineer, Cloud Engineer, Operations Expert and right now a Site Reliability Engineer.... I'm a systems administrator.
It doesn't need someone who knows how to use the labrythine AWS services and console?
These comments sound super absurd to me, because RDS is difficult as hell to setup, unless you do it very frequently or already have it in IoC format, since one needs setting up a VPC, subnets, security groups, internet gateway, etc.
It's not like creating a DynamoDB, Lambda or S3 where a non-technical person can learn it in a few hours.
Sure, one might find some random Terraform file online to do this or vibe-code some CloudFormation, but that's not really a fair comparison.
I totally also understand why some people with family to support mortgage to pay they can't just walk way from a job at FAANG or MAMAA type place.
Looking at your comparison, this point it just seems like a scam.
I am not even thousands km near the level of what you are doing, but my client was paying $100/m for an AWS server, SQS and S3 bucket, for a small PHP based web application that uses Amazon Seller API, Keepa API for the products he ships. Used MySQL for data storage.
I implemented the whole thing in Python, Django, and PostgreSQL (initially used SQLite) put it in a $25/m unmanaged VPS.
I have not got any complaints about performance, and it's running continuously updating product prices, details, processing PDF invoices using OCR, finding missing products in shipments, while also serving the website, and a 4 core server with 6GB RAM is handling it just fine.
The load is not going to be so high to require AWS and friends, for now. It's a small internal app, probably won't even get over 100 users, and if it ever does, it's extremely simple to migrate, because the app is so compact, even though not exactly monolithic.
And still, it probably won't need a $100 AWS server, unless we are scaling up much larger.
If all you need is "good enough" reliability and basic compute power (which I think is good enough for many businesses, considering AWS isn't exactly outage free either), you're probably better off getting a server or renting one from a cheap cloud host. If you're promising five nines of uptime for some reason, you may want to reconsider.
This is exactly my point. Sorry if I was not clear on my OP.
We are using Seller API to get different product information, while their API provides base work for communicating with their endpoint, you'll have to implement your own system to use that, and handle the absurd unreliability of their API's rate limiter, and the spider web of API callbacks to get information that you require.
There are cheaper ways of building that use case on AWS.
Most AWS sticker shock I’ve seen results from someone who doesn’t really understand cloud trying to build on the cloud. Cost has to be designed in from the start (in addition to security, operational overhead, etc).
In general, I’ve found two types of engineering teams who don’t use the cloud: the mugs and the superstars. And since superstars are few and far between, that means…
I guess those promises about needing fewer expensive people never materialised.
tbh, aside from the really anaemic use-cases where everything actually manages to scale to zero and has very low load: I have genuinely never seen an AWS project (outside of free credits of course) that works out cheaper than what came before.
That's TCO from PNLs, not a "gut feeling". We have a decade of evidence now.
My comment was not about using AWS is bad, it has its uses. My comment was about how in this instance it was simply not needed. And I even speculated when it might be needed.
To pick the correct tool for the job, is what, it means to be an Engineer, or a person with common sense. With experience, we can get over childish absolutions of a tool or service, and look at the broader aspects, unless, of course, we are expecting some kind of monetary gains.
I do not know how much actually cost of the original application.
The app, that I was developing, was for another purpose, and the reimplementation was later added.
The app replaces an existing commercial app that is being used, which is $200+/m. So, may be 4/5 years to exceed the savings. They have been using the app for 3 years, I think.
And, maybe I am beating my drum a little, I believe my implementation works, and looks much better than the commercial or the first implementation.
So, I am really looking forward for this to success.
For most public cloud providers you have to give them your credit card number so they can charge an arbitrary amount.
For Hetzner, instead of CC#, you give a scan of your ID (of course you can attach your CC too or Paypal). Personally I do my payments via a bank transfer. I recently paid for the whole 2025 and 2026 for all my k8s clusters. It gives unimaginable peace of mind when compared to AWS/GCP/Azure.
Plus, their cloud instances often spin up much faster than EC2.
Data centers all over the country and I get to locate under 10ms from my regional audience.
Just a data point if you want some bigger iron than a VM.
Before that, I used to go for Linode, but I think they've become more pricey?
Too bad, actually, their service was pretty good.
Also in my experience more complex systems tend to have much less reliability/resilience than simple single node systems. Things rarely fail in isolation.
Now, if you actually need to decouple your file storage and make it durable and scalable, or need to dynamically create subdomains, or any number of other things… The effort of learning and integrating different dedicated services at the infrastructure level to run all this seems much more constraining.
I’ve been doing this since before the “Cloud,” and in my view, if you have a project that makes money, cloud costs are a worthwhile investment that will be the last thing that constrains your project. If cloud costs feel too constraining for your project, then perhaps it’s more of a hobby than a business—at least in my experience.
Just thinking about maintaining multiple cluster filesystems and disk arrays—it’s just not what I would want to be doing with most companies’ resources or my time. Maybe it’s like the difference between folks who prefer Arch and setting up Emacs just right, versus those happy with a MacBook. If I felt like changing my kernel scheduler was a constraint, I might recommend Arch; but otherwise, I recommend a MacBook. :)
On the flip side, I’ve also tried to turn a startup idea into a profitable project with no budget, where raw throughput was integral to the idea. In that situation, a dedicated server was absolutely the right choice, saving us thousands of dollars. But the idea did not pan out. If we had gotten more traction, I suspect we would have just vertically scaled for a while. But it’s unusual.
This is because you are looking only at provisioning/deployment. And you are right -- node size does not impact DevOps all that much.
I am looking at the solution space available to the engineers who write the software that ultimately gets deployed on the nodes. And that solution space is different when the nodes have 10x the capability. Yes, cloud providers have tons of aggregate capability. But designing software to run on a fleet of small machines is very different from accomplishing the same tasks on a single large machine.
It would not be controversial to suggest that targeting code at an Apple Watch or Raspberry Pi imposes constraints on developers that do not exist when targeting desktops. I am saying the same dynamic now applies to targeting cloud providers.
This isn't to say there's a single best solution for everything. But there are tradeoffs that are now always apparent. The art is knowing when it makes sense to pay the Cloud Tax, and whether to go 100% Cloud vs some proportion of dedicated.
I’ve never had an issue with moving data.
I think you confuse Heztner with bare metal. Hetzner has Hetzner Cloud which is like AWS EC2 but much cheaper. (They also have bare metal servers which are even cheaper.) With Heztner Cloud, you can use Terraform, Github Actions and whatever else you mentioned.
I think the issue is actually the opposite.
With the cloud, the engineers fail to see the actual cost of their inefficient scaled-out code, because someone else (the CFO) pays the bill; and the answer to any issue, is simply adding more "workers" and more "cloud", since they're basically "free" from the perspective of the employee. (And the more "cloud" something is, like, the serverless, the more "free", completely inverting the economics of making a profit on the service — when the CFO tells you that your AWS bill is too high, you move everything from the EC2 to AWS Lambda, since the salesperson from AWS tells you that serverless is far cheaper, only for the bill to get even higher, for reasons unknown, of course.)
Whom the cloud tax actually constrains are the entrepreneurs and solo-preneurs. If you have to pay $5000/mo to AWS just for the infra, you can only go so long without lots of revenue, and you'd need to have a whopping 5k/mo+ worth of revenue before breaking even. Yet with a $200/mo like at OVH or Hetzner, you can afford to let it grow at negligible cost to yourself, and it can basically start being profitable with the first few users.
Don't believe this? Look at the blog entries by the guy who bought Yahoo!'s Delicious, written before they went bankrupt and were up for sale. He was basically pointing out that the services have roughly the same number of users, and require the same engineering resources, yet one is being operated at a loss, whereas the other one makes a profit (guess which one, and guess why).
* https://en.wikipedia.org/wiki/Delicious_(website)
* https://en.wikipedia.org/wiki/Pinboard_(website)
* https://news.ycombinator.com/from?site=blog.pinboard.in
So, literally, the difference between the cloud and renting One Big Server, is making a loss and going out of business, and remaining in business and purchasing your underwater competitor for pennies on the dollar.
However, to the point of microservices as the article mentions, you probably should look at lambda (or fargate, or a mix) unless you can really saturate the capacity of multiple servers.
When we swapped to ECS+EC2 running microservices over to lambda our costs dropped sharply. Even serving millions of requests a day we spend a lot of time in between idle, especially spread across the services.
Additionally, we have 0 outages now from hardware in the last 5 years. As an engineer, this has made my QoL significantly better.
Probably? It's about 5-10X more expensive than equivalent services from Hetzner.
You can get actual direct-attached SSDs on EC2 (and I'd expect performance to be on-par with Hetzner), but those are ephemeral and you lose them on reboot.
That said, with a defined workload without a ton of variation or segmentation needs there are lots of ways to deliver a cheaper solution.
What are you getting, and do you need it?
* Centralized logging, log search, log based alerting
* Secrets manager
* Managed kubernetes
* Object store
* Managed load balancers
* Database HA
* Cache solutions
... Can I run all these by myself? Sure. But I'm not in this business. I just want to write software and run that.
And yes, I have needed most of this from day 1 for my startup.
For a personal toy project, or when you reach a certain scale, it may makes sense to go the other way. U
So while there are areas where you need to introduce distributed systems, this repeated disparaging comment of “toy hobby projects” makes me distrust your judgement heavily. I have replaced many such installations by actually delivering (grand distributed designs often don’t fully deliver), reducing costs, dramatically improving performance, and most importantly reducing complexity by magnitudes.
One server means you can handle the equiv of 100+ AWS instances. And if you're into that turf, then having a rack of servers saves even more.
Big corp is pulling back from the cloud for a reason.
It's still useful to have the various services, background jobs, system events, etc. in one indexed place which can also manage retention and alerting. And ideally in a place reachable even if the main service goes down. I've got centralised logging on a small homelab server with a few services on it and it's worth the effort.
> Load balancing? In practice most people for most work don’t use it because of actually outgrowing hardware, but because they have to provision to shared hardware without exclusivity.
Depending on how much you lose in case of downtime, you may want at least 2x of hardware for redundancy and that means some kind of fancy routing (whether it's LB, shared IP, or something else)
> Secrets management? There are no secrets to be constantly distributed to various machines on a network.
Typically businesses grow to more than one service. For example I've got a slack webhook in 3 services in a small company and I want to update it in one place. (+ many other credentials)
> Caching? Distributed systems create latency that doesn’t need to exist at all
This doesn't solve the need for caching results of larger operations. It doesn't matter how much latency you have or not, you still don't want that rarely-changing 1sec long query to run on every request. Caching is rarely only about network latency.
That's amazing. I wish I could do the same.
Unfortunately, I cannot run my business on a single server in a cage somewhere for a multitude of reasons. So I use AWS, a couple of colos and SaaS providers to deliver reliable services to my customers. Note I'm not a dogmatic AWS advocate, I seek out the best value -- I can't do what I do in AWS without alot of capital spend on firewalls and storage appliances, as well as the network infrastructure and people required to make those work.
You must be doing truly a lot of growth prior to building. Or perhaps insisting on tiny VMs for your loads?
This happens way too often. Early-stage startups that build everything on the AWS free tier (t2.micro only!), and then when the time comes they scale everything horizontally
Do people really use the bare CloudWatch logs as an answer for log search? I find it terrible and pretty much always recommend something like DataDog or Splunk or New Relic.
which in reality is any project under a few hundred thousand users
The problem that Hetzner and a lot of hardware providing hosts have, is the lack of affordable flexibility.
Hetzner their design is based upon a base range of standardized products. This can only be upgraded within a pre-approved range of upgrade options (limited to storage/memory).
Upgrades are often a mixed bag of carefully designed "upgrade paths". As you can expect, upgrades are not cheap. Doubling the storage on a base server, often increases the price of your server by 50 to 75%. The typical customizing will cost you dearly.
This is where AWS wins a lot more. Yes, they are expensive as hell, but you often are not stuck to a base config and a limited upgrade path. The ability to scale beyond what Hetzner can offer is there, and your not forced to overbuy from the start. Transferring between servers is a few buttons and done. With Hetzner, if you did not overspec from the start, your going to do those fun server migrations.
The ironic part is, that buying your own hardware and running it yourself, often ends up paying back within a 8~12 month periode (not counting electricity / internet). And you maintain a lot more flexibility.
* You want to use bifurcation, go for it.
* You want to use consumer 4TB nvme's for second layer read storage (what hetzner refuses to offer as they limited those to 2TB and only one a few servers), go for it.
* You want a 10Gbit interlink between your server, go for it. No need to pay a monthly fee! No need to reserve "future space".
* O, you want a 25Gbit, go for it (hetzner = not possible).
* You want 50Gbit ...
* You want to chuck in a few LLM capable GPUs without breaking the bank...
Its ironic that we are 2025 and Hetzner is stil limited to 1Gbit connection on its hardware, when just about any consumer level hardware has 2.5Gbit by default for years.
Your own hardware gives you the flexibility of AWS and the cost saving beyond Hetzner. Maybe its just my environment, but i see more and more smaller to medium companies going back to their own locally run servers. Not even colocation.
The increase in consumer level fiber, what used to be expensive or not available, has opened the doors for businesses. Most companies do not need insane backbones.
The fact that you can get business fiber 10Gbit for a 100 Euro price in some EU countries (of course never the north), is insane. I even seen some folks combining fiber with starlink & 5G as backup in case their fiber fails/is out.
As long as you fit within a specific usage case that is being offered by Hetzner, they are cheap. But its the moment you step outside that comfort zone, ... This is one of Hetzner weaknesses and where AWS or Self hosted comes back.
We had a leased server from them, running VMware, and we had Linux virtual machines for our application.
We ran out of RAM. We only had 16 or 32GB at the time. Hey, can we double this? Sure, but our payment would nearly double. How does that make any sense?
If this were a co-located box we owned, I could buy a pair of $125 chips from Crucial (or $250 Dell chips from CDW) and there we go. But we're expected to pay this much more per month?
Their answer was "you can do more with the server so that's what you're paying for"
Storage was a similar situation, we were still on RAID with spinning drives and we wanted to go SSD, not even NVME. Wasn't going to happen. And if we went to a new server we'd have to get all new IP's and stuff. Ugh.
And 10Gb...that was a pipe dream. Costs were insane.
We ended up having to decide between two things:
1. Move to a co-lo and buy a couple servers, ala StackExchange. This is what I wanted to do.
2. Tweak the current application stack, and re-write the next version to run on AWS.
What did we end up doing? Some half ass solution using the existing server for DB and NGINX proxy, while running the sites on (very slow) Slicehost instances (which Rackspace had recently acquired and roughly integrated into their network). So we still had downtime issues, slow databases, etc.
For storage, Hetzner does offer Volumes, which you can attach to your VM and you can choose exactly how large you want them to be and are charged separately. But your argument about doubling resources and doubling prices still holds for RAM.
The argument was about dedicated hardware. But it still holds for VPS.
Have you seen the price of Cloud Storage? ARM VPS 40GB is 4.51 (inc tax), for 40GB storage, your paying 2.10 Euro. So my argument still holds as your paying almost 50% more, just to go from 40GB to 80GB. And that ratio gets worse if your renting higher end VPS, and double your storage on them.
Lets be honest, 53.62 Euro for 1TB of SSD storage in 2025, is ridiculous.
Netcup is at 12 Euro/TB for SSD storage (same speed as the VMS as its just localized storage on the server, not network storage). Fyi: A ARM 6 Core 256GB, at netcup is 6.26 Euro.
Hetzner used to be the market leader and pushed others, but you barely see any new products or upgraded from them anymore. I said it before, if Netcup actually invested into a more modern/scalable VPS solution (instead of their 2010 VPS panels), they will eat a lots of Hetzners clients.
I have several workloads that just invoke Lambda in parallel. Now I effectively have a 1000 core machine and can blast through large workloads without even thinking about it. I have no VM to maintain or OS image to consider or worry about.
Which highlights the other difference that you failed to mention. Hertzner charges a "one time setup" fee to create that VM. That puts a lot of back pressure on infrastructure decisions and removes any scalability you could otherwise enjoy in the cloud.
If you want to just rent a server then Hertzner is great. If you actually want to run "in the cloud" then Hertzner is a non-starter.
Lambda is a decent choice when you need fast, spiky scaling for a lot simple self-contained tasks. It is a bad choice for heavy tasks like transcoding long videos, training a model, data analysis, and other compute-heavy tasks.
It's almost exactly the same price as EC2. What you don't get to control is the mix of vCPU and RAM. Lambda ties those two together. For equivalent EC2 instances the cost difference is astronomically small, on the order of pennies per month.
> like transcoding long videos, [...] data analysis, and other compute-heavy tasks
If you aren't breaking these up into multiple smaller independent segments then I would suggest that you're doing this wrong in the first place.
> training a model
You're going to want more than what a basic EC2 instance affords you in this case. The scaling factors and velocity are far less of a factor.
It should be obvious that this is not the best answer for all projects.
Care to elaborate?
https://medium.com/life-at-apollo-division/compare-the-cost-...
Hetzner Cloud, then! In the US, $0.53/hr / $333.59/mo for 48 vCPU/192GB RAM/960GB NVMe. Includes 8 TB/mo traffic, when 8 TB egress would cost $720 on EC2; more traffic is $1.20/TB when the first tier of AWS egress is $90/TB. No setup fee. Not that it's EC2 but there's clearly flexibility there.
More generally, if you want AWS, you want AWS; if you want servers you have options.
If either of these exceed the limitations of the call, which is 6MB or 256kB depending on call type, then you can just use S3. For large distributed task coordination you're going to be doing this anyways.
> deployment .zip sizes
Overlays exist and are powerful.
> max execution time
If your workload depends on long uninterrupted runs of time on single CPUs then you have other problems.
> Plus you'll be locked into AWS.
In the world of serverless your interface to the endpoints and semantics of Lambda are minimal and easily changed.
You're better off using ECS / Fargate for application logic.
https://www.hetzner.com/dedicated-rootserver/ax162-s/
Because its backed into the price. If you run a VPS for a month, you get the listed monthly price. But if you run a VPS for a shorter time, the hourly billing price is a lot more expensive.
The ironic part being, that your better off keeping a VPS active until the end of your month periode (if you already crossed 2/3), then its is to cancel early.
Noticed that few people realize that the hourly price != the monthly price.
It's a nice pattern. Just don't make them clones of each other, or they might go BLAM at the same time!
https://news.ycombinator.com/item?id=32049205
https://news.ycombinator.com/item?id=32032235
https://news.ycombinator.com/item?id=32028511 (<-- this is where it got figured out)
---
Edit: both these points are mentioned in the OP.
https://www.servethehome.com/amd-epyc-7002-rome-cpus-hang-af...
HN goes down when we restart the server process, usually as part of updating the code - but only for a few seconds. The message "Restarting the server. Shouldn't take long." displays when that is happening.
There are also, to my exasperation, still moments of brownout during certain traffic spikes or moments of obscure resource contention. But these are at least rarer than they used to be.
Sure you need net/infra admins but the software and hardware these days are pretty management friendly and you'll find you still need (often more expensive "cloud") admins so you're not offsetting much management cost there.
Colocation is plentiful and providers often aggregate and resell bandwidth from their preferred carriers.
At one point we were up to 8 dell vrtx clusters and a few SANs, with 500+ VMs from huge msSQL servers to kube clusters the public cloud bill would have been well into the 6 figures even with preferred pricing and reserved instances. Our colocation bill was $2400/mo and that was mostly for power. The one thing that always surprised me was how much faster everything was - every time we had to scale-over into the cloud the public cloud node was noticably slower even for identical CPU generations and vcpu.
You need to be very keen about server deals, updates, support contracts and licenses - but it's really manageable and interconnecting with the cloud is trivial at this point - you can get a "cloud connect" fiber drop to your preferred cloud provider and connect your colo infra to your vpc.
Once you have an established baseline for your server needs - it's almost always more capital friendly to buy the servers and keep them running for the ~5 reliable years you'll get out of them - usually break even here is 2-3 years vs renting from a provider. If you're running your servers until they fail you'll get 7-10 years out of them, provided the power cost is still worth running them (usually that is also around the 8-10 year mark depending on your power cost).
So there are many reasons you'd buy vs rent - including capital deductions and access to cheap interest rates. You can also get some pretty crazy deals (like 33% of new price) by buying 2-3 year old equipment, then continue to run them for another 4-5 years, which is the lowest cost scenario if you don't need bleeding edge.
Especially for the "one (or a few) big server" scenario in the article, that would seem to me a pretty big factor.
They could have got the job done by hosting the service in a vps with a multi-tenant database schema. Instead, they went about learning kubernetes and drillingg deep into "cloud-native" stack. Spent a year trying to setup the perfect devops pipeline.
Not surprisingly the company went out of business within the next few years.
But the engineers could find new jobs thanks to their acquired k8s experience.
I mean, of the two, the PaaS route certainly burns more money, the exception being the rare shop that is so incompetent they can't even get their own infrastructure configured correctly, like in GP's situation.
There are guaranteed more shops that would be better off self-hosting and saving on their current massive cloud bills than the rare one-offs that actually save so much time using cloud services, it takes them from bankruptcy to being functional.
Does it? Vercel is $20/month and Neon starts at $5/month. That obviously goes up as you scale up, but $25/month seems like a fairly cheap place to start to me.
(I don't work for Vercel or Neon, just a happy customer)
And that’s before you factor in 500gb of storage.
These companies all ended up massively increasing their budgets switching to cloud workloads when a simple server in the office was easily enough for their 250 users. Cloud is amazing for some uses and pure marketing BS for others but it seems like a lot of engineers aim for a perfect scalable solution instead of one that is good enough.
A big pain point that I personally don't love is that this non-cloud approach normally means running my own database. It's worth considering a provider who also provides cloud databases.
If you go for an 'active/passive' setup, consider saving even more money by using a cloud VM with auto scaling for the 'passive' part.
In terms of pricing the deals available these days on servers are amazing you can get 4GB RAM VPSs with decent CPU and bandwidth for ~$6 or bare metal for ~$90 for 32GB RAM quad core worth using sites like serversearcher.com to compare.
Compare that with using your distro's packaged version where you can have version variations, variations in default config or file path locations, etc.
You can abuse git for it if you really want to cut corners.
SQLite uses one reader/writer lock over the whole database. When any thread is writing the database, no other thread is reading it. If one thread is waiting to write, new reads can't begin. Additionally, every read transaction starts by checking if the database has changed since last time, and then re-loading a bunch of caches.
This is suitable for SQLite's intended use case. It's most likely not suitable for a server with 256 hardware threads and a 50Gbps network card. You need proper transaction and concurrency control for heavy workloads.
Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
SQLite is lite. Use it for lite things, not hevy things.
Sqlite (properly configured) will outperform "proper databases" often by an order of magnitude in the context of a single box. You want a single writer for high performance as it lets you batch.
> 256 hardware threads...
Have you tried? I have. Others have too. [1]
> Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
Sqlite has blobs so you can use your own custom encoding which is what you want in a high performance context.
Here's sqlite on a 5$ shared VPS that can handle 10000+ checks per second over a billion checkboxes [2]. You're gonna be fine.
- [1] https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...
- [2] https://checkboxes.andersmurphy.com
SQLite is easily the best scaling DB tech I've used. I've moved all my postgres workloads over to it and the gains have been incredible.
It's not a panacea and not the best in all cases but it's a very sane default that I recommend everyone start with and only complicate their stack with an external DB when they they start hitting real limits (often never happens)
I moved several projects from sqlite to postgres because sqlite didn't scale enough for any of them.
The out of the box defaults for sqlite are terrible for web apps.
SQLite (actually SQL-ite, like a mineral) maybe be light, but so are many workloads these days. Even 1000 queries per second is quite doable with SQLite and modest hardware, and I've worked at billion dollar businesses handling fewer queries than that.
Most if not all of your concerns with SQLite are simply a matter of not using the default configuration. Enable WAL mode, enable strict mode, etc. and it's a lot better.
If you go this route, you’ve got to build out your own stack for security, global delivery, databases, storage, orchestration, networking ... the whole deal. That means juggling a bunch of different tools, patching stuff, fixing breakage at 3 a.m., and scaling it all when things grow. Pretty soon you need way more engineers, and the “cheap” servers don’t feel so cheap anymore.
Early Google rejected big iron and built fault tolerance on top of commodity hardware. WhatsApp used to run their global operation employing only 50 engineering staff. Facebook ran on Apache+PHP (they even served index.php as plain text on one occasion). You can build enormous value through simple means.
Suddenly I learned why my employer was willing to spend so much on OpenStack and Active directory.
lol, why was this the defining moment? She wasn't too keen on hearing the high pitch wwwwhhhhuuuuurrrrrrr of the server fans?
I think in general that expectation is NOT acceptable though especially around data loss. Because the non engineering stakeholders don't believe it is.
Engineers don't make decisions in a vacuum, if you can manage the expectations, good for you. But in most cases that's very much an uphill battle which might make you look incompetent because you cannot guarantee no data loss.
That being said I've lost a lot of VMs on ec2 and had entire regions go down in gcp and aws in the last 3 years alone, so going to the public cloud isn't a solves it all solution - knock on wood the colo we've been using hasn't been down once in 12+ years.
Yep, and it's mostly caused by the VC funding model - if your investors are demanding hockey-stick growth, there is no way in hell a startup can justify (or pay for) the resulting Capex.
Whereas a nice, stable business with near-linear growth can afford to price in regular small Capex investments.
You don't need to buy server hardware(!), the article specifically mentions renting from eg Hetzner.
> The benefits of "just don't think about hardware" are real
Can you explain on this claim, beyond what the article mentioned?
I run a lambda behind a load balancer, hardware dies, its redundant, it gets replaced. I have a database server fail, while it re provisions it doesn't saturate read IO on the SAN causing noisy neighbor issues.
I don't deal with any of it, I don't deal with depreciation, I don't deal with data center maintenance.
You don't deal with that either if you rent a dedicated server from a hosting provider. They handle the datacenter and maintenance for you for a flat monthly fee.
But the cloud premium needs reiteration: twenty five times. For the price of the cloud server, you can have twenty-five-way redundancy.
A medium to large size asteroid can cause mass extinction events - this happens sometimes - it's not a theoretical risk.
The risk of the people responsible for managing the platform messing up and losing some of your data is still a risk in the cloud. This thread has even already had the argument "if the cloud provider goes down, it's not your fault" as a cloud benefit. Either cloud is strong and stable and can't break, or cloud breaks often enough that people will just excuse you for it.
Yes, there is.
Honestly, it looks to me that this school of thought is mostly adopted by people that can't do arithmetic or use a calculator. But it does absolutely exist.
That said, no, servers are not nearly expensive enough to move the needle on a company nowadays. The room that often goes around them is, and that's why way more people rent the room than the servers in it.
I ran the IT side of a media company once, and it all worked on a half-empty rack of hardware in a small closet... except for the servers that needed bandwidth. These were colocated. Until we realized that the hoster did not have enough bandwidth, at which point we migrated to two bare metal servers at Hetzner.
The actual space isn't a big deal, but the entire environment has large fixed costs.
In practice, all that except connectivity is relatively easy to have on-site.
Connectivity is highly dependent on the business location, local providers, their business plans and their willingness to go out of their way to serve the clients.
And I am not talking only about bandwidth, but also reserve lines and latency.
Never underestimate the price people are willing to pay to evade responsibility. I estimate this is a multi-billion dollar market.
I for one really miss being able to go see the servers that my code runs on. I thought data centers were really interesting places. But I don't see a lot of effort to decide things based on pure dollar cost analysis at this point. There's a lot of other industry forces besides the microeconomics that predetermine people's hosting choices.
Even colocation is often fraud with issues. I shall not mentioned the plectra of dead hardware from datacenter electricity failures. Ironically, my home has more stable electricity then some datacenters lol.
Unless you running a business where a few minutes downtime will cost you millions, most companies can literally run their own servers from their basements. I often see how much people overestimate their need for 99.999% uptime, or bandwidth requirements.
Its not like colocation is that much cheaper. The electricity prices your paying are often more expensive then even business/home electricity. That leave only internet/fiber, and the pletra of commercial fiber these days.
Used to get minimum quoted price of 2k, for a 1Gbit business fiber years ago (not inc install costs). Now you get in some countries, 5 or 10Gbit for 100 Euro business fiber.
An IBM z17 is effectively one big server too, but provides levels of reliability that are simply not available in most IT environments. It won't outperform the AMD rack, but it will definitely keep up for most practical workloads.
If you sit down and really think honestly about the cost of engineering your systems to an equivalent level of reliability, you may find the cost of the IBM stack to be competitive in a surprising number of cases.
ETA - fixed spelling error
Now, if you can live with the weird environment and your people know how to programm what is essentially a distributed system described in terms noone else uses: I guess it's still ok, given the competition is all executing IBMs playbook too.
My understanding is that usually you subdivide into few LPARs and then reboot the production ones on schedule to prevent drift and ensure that yes, unplanned IPLs will work
My main concern is that at the current cloud premiums rates, I will be better off even if I need to hire someone specifically for managing the local infra.
How many HTTP requests/second can a single machine handle? (2024) - https://news.ycombinator.com/item?id=45085446 - Aug 2025 (32 comments)
Using cloud and box storage on Hetzner is more expensive than the dedicated server, 4x owning the hardware and paying the power bill. AWS and Azure are just nuts, >100x the price because they charge so much for storage even with hard drives. Contabo nor Netcup can do this, its too much storage for them.
Every time I look at this I come to the same basic conclusion, the overhead of renting someone else’s machine is quite high compared to the hardware and power cost and it would be a worse solution than having that performance on the local network for bandwidth and latency. The problem isn't so much the compute performance, that is relatively fairly priced, its the storage costs and data transfer that bites.
Not really what the article was necessarily about but cloud is sort of meant to be good for low end hardware but its actually kind of not, the storage costs are just too high even a Hetzner Storage box.
Here is a fun one ...
https://www.reddit.com/r/selfhosted/comments/1dqq3h8/my_12x_...
Dude is running 12x AMD 6600HS with a power draw between 300 a 400W. The compute alone is easily 3x of a equivalent Hetzner 48c server. We shall not mention the that inc 768GB of memory (people underestimate how much high capacity rdimms draw in power).
The main issue with Hetzner is, as long as you only use their base configuration servers, they are very competitive. The issue is, if you start to step a little bit out of line, the prices simply skyrocket. Try adding more storage to some servers, memory, or you need a higher interconnect between your servers (limited to 10Gbit).
Even basic consumer hardware comes with 2.5Gbit, yet, Hetzner is in the stone ages with 1Gbit. I remember the time when Hetzner introduced 1Gbit. Hetzner was innovation, and progression. But that has been slowly vanishing. Hetzner has been getting more and more lazy. You see the issue also with their cloud offerings storage. Look at Netcup, even Strato etc... They barely introduce anything new anymore, and when something comes its often less competitive or broken. The whole S3 costing Backblaze price levels and non-stop issues.
You can tell they are the only company that every pushed for consumer hardware hosting on mass scale, what made them a small monopoly in the market. And it shows if your a old customer, and know their history. Hey, do people remember the price increases for the auction hardware because of the Ukraine invasion. Do not worry folks, when the electricity prices go down, we will adjust them down. O, we are pre-war prices for almost 2 years. Where is that promised price drops? Hehehe ...
But then you need to buy newer, more expensive hardware, which pushes your initial price up (divide by the amount of time you'll need to host the server to get the monthly equivalent, then add power/connectivity/maintenance and compare to Hetzner).
Btw generally the reason homelabbers flock to legacy enterprise hardware is that it generally gives you a good amount of compute for a cheap price, if you don't mind the increased power cost. This is actually fine as a lot of homelab usage is bursty and the machine can be powered off outside of those times.
Unless you are bought a 50~64 core server (and the power bill to match), your often way cheaper with consumer level hardware. Older server hardware advantage is more on the amount of total memory you can install or the amount of pcie lanes.
The cheapest enterprise CPUs (AMD as example) are currently Zen2, the moment you want Zen3s, the prices go up a lot, for anything 32core or higher.
I have seen so many homelab that ran ancient hardware, only for them to realize that they are able to do the same or more, on modern mini-pcs or similar hardware. Often at the fraction of the power draw.
The reason why so many people loved to run enterprise hardware, was because in the US you had electricity prices in the low single digit or barely in the teens. When you get some 35 cent/kw prices, people tend to do a bit of research and find out its not the 2010's anymore.
I ran multiple enterprise servers, with 384GB memory, things idled at 120W+ (and that was with temp controlled fans because those drain a ton of power). Second PSU? There goes your idle to 150W+.
Ironically, just get a few minipcs and with the same memory capacity spread, your doing 50W (often less) or less. The advantage of using laptop cpus. I have had 8 core Zen3 systems, doing 4.5W in idle.
And yes, you can turn off enterprise hardware but you can also sleep minipcs. And they do not tend to sound like jet engines when waking up ;)
I have a minisforum itx board next to me, with 16c Zen4, cost 380 Euro. Idles at 17W. Beats any 32C Zen2 enterprise server. Even something like a AMD EPYC 7C13 (64C), will be ~40% faster and still costs 600 Euro from China. It will do better on actual multithread workloads where you can really have tons of processes but 400 bucks vs 600+400 motherboard.
Just saying, enterprise has its uses, like in enterprise environments but for most people, especially homelabbers, its often overkill.
You absolutely can, and it has been the most common practice for scaling them for decades.
That’s obviously possible snd common.
What I meant was actually butchering the monolith into separate pieces and deploying it, which is - by the definition of monolith - impossible.
There is no limit or cost to deploying 10000 lines over 1000 lines.
If that’s possible (regardless of what was deployed to the two machines) then the app just isn’t a true monolith.
Yes you can. Its called having multiple applications servers. They all run the same application, just more of them. Maybe they connect to the same DB, maybe not, maybe you shard the DB.
Of course you can. I've done it.
Identical binary on multiple servers with the load balancer/reverse proxy routing specific requests to specific instances.
The practical result is indeed "running different aspects of the monolith on different servers".
You can also publish libraries as nuget packages (privately if necessary) to share code across repos if you want the new app in it's own repo.
I've worked on projects with multiple frontends, multiple backbends, lots of separately deployed Azure functions etc, it's no problem at all to make significant structural changes as long as the code isn't a big ball of mud.
I always start with a monolith, we can easily make these changes when necessary. No point complicating things until you actually have a reason to.
How much waste is there from all this meta-software?
In reality, I host more on Raspberry Pis with USB SSDs than some people host on hundred-plus watt Dells.
At the same time, people constantly compare colo and hardware costs with the cost per month of cloud and say cloud is "cheaper". I don't even bother to point out the broken thinking that leads to that. In reality, we can ignore gatekeepers and run things out of our homes, using VPSes for public IPs when our home ISPs won't allow certain services, and we can still have excellent uptimes, often better than cloud uptimes.
Yes, we can consolidate many, many services in to one machine because most services aren't resource heavy constantly.
Two machines on two different home ISP networks backing each other up can offer greater aggregate uptime than a single "enterprise" (a misnomer, if you ask me, if you're talking about most x86 vendors) server in colo. A single five minute reboot of a Dell a year drops uptime from 100% to 99.999%.
Cloud is such bullshit that it's exhausting even just engaging with people who "but what if" everything, showing they've never even thought about it for more than a minute themselves.
Right now, my plan is to move from a bunch of separate VPSes, to one dedicated server from Hetzner and run a few VMs inside of it with separate public IPs assigned to them alongside some resource limits. You can get them for pretty affordable prices, if you don't need the latest hardware: https://www.hetzner.com/sb/
That way I can limit the blast range if I mess things up inside of a VM, but at the same time benefit from an otherwise pretty simple setup for hosting personal stuff, a CPU with 8 threads and 64 GB of RAM ought to be enough for most stuff I might want to do.
Give me a box, trust me with ssh keys and things are so much easier. Simple is good for the soul and the wallet.
My biggest takeaway was to have my core database tables (user, subscription, etc) backed up every 10 minutes, and the rest every hour, and test their restoration. (When I shut down the site it was 1.2TB.) Having a script to quickly provision a new node—in case I ever needed it—would have something up within 8 minutes from hitting enter.
When I compare this to the startups I’ve consulted for, who choose k8s because it’s what Google uses yet they only push out 1000s of database queries per day with a handful of background jobs and still try to optimize burn, I shake my head.
I’d do it again. Like many of us I don’t have the need for higher-complexity setups. When I did need to scale, I just added more vCPUs and RAM.
The technical bits aren’t all there, though, and there’s a plethora of noise and misinformation. Happy to talk via email though.
Use one big server - https://news.ycombinator.com/item?id=32319147 - Aug 2022 (585 comments)
Is there something obvious that I'm missing?
Even still at that point you just round robin to a set of big machines. Easy
Even the ones who do know have been conditioned to tremble with fear at the thought of administrating things like a database or storage. These are people who can code cryptography kernels and network protocols and kernel modules, but the thought of running a K8S cluster or Postgres fills them with terror.
“But what if we have downtime!” That would be a good argument if the cloud didn’t have downtime, but it does. Most of our downtime in previous years has been the cloud, not us.
“What if we have to scale!” If we are big enough to outgrow a 256 core database with terabytes of SSD, we can afford to hire a full time DBA or two and have them babysit a cluster. It’ll still be cheaper.
“What if we lose data?” Ever heard of backups? Streaming backups? Hot spares? Multiple concurrent backup systems? None of this is complex.
“But admin is hard!” So is administrating cloud. I’ve seen the horror of Terraform and Helm and all that shit. Cloud doesn’t make admin easy, just different. It promised simplicity and did not deliver.
… and so on.
So we pay about 1000X what we should pay for hosting.
Every time I look at the numbers I curse myself for letting the camel get its nose under the tent.
If I had it to do over again I’d forbid use of big cloud from day one, no exceptions, no argument, use it and you’re fired. Put it in the articles of incorporation and bylaws.
These days probably the best way of getting these 'cloudy' engineers on board is just to tell them its Kubernetes and run all of your servers as K3s.
Dev culture is totally fad driven and devs are sheep, so this works.
And when you need to move fast (or things break), you can't wait a day for a dedicated server to come up, or worse, have your provider run out of capacity (or have to pick a different specced server)
IME, having to go multi cloud/provider is a way worse problem to have.
A multi node system tends to be less reliable and more failure points than a single box system. Failures rarely happen in isolation.
You can do zero downtime deployment with a single machine if you need to.
Just like a lot of problems exists between keyboard and chair, a lot of problems exist between service A and service B.
The zero downtime deployment for my PHP site consisted of symlinking from one directory to another.
Honestly, we need to stop promoting prematurely making everything a network request as a good idea.
But how are all these "distributed systems engineers" going to get their resume points and jobs?
I'm not saying everybody should do this. There are of-course a lot of services that can't afford even a minute of downtime. But there is also a lot of companies that would benefit from a simpler setup.
In all those years, I’ve had precisely one actual hardware failure: a PSU went out. They’re redundant, so nothing happened, and I replaced it.
Servers are remarkably resilient.
EDIT: 100% uptime modulo power failure. I have a rack UPS, and a generator, but once I discovered the hard way that the UPS batteries couldn’t hold a charge long enough to keep the rack up while I brought the generator online.
We had a rack in data center, and we wanted to put local UPS on critical machines in the rack.
But the data center went on and on about their awesome power grid (shared with a fire station, so no administrative power loss), on site generators, etc., and wouldn't let us.
Sure enough, one day the entire rack went dark.
It was the power strip on the data centers rack that failed. All the backups grids in the world can't get through a dead power strip.
(FYI, family member lost their home due to a power strip, so, again, anecdotally, if you have any older power strips (5-7+ years) sitting under your desk at home, you may want to consider swapping it out for a new one.)
Re: power strips, thanks for the reminder. I’m usually diligent about that, but forgot about one my wife uses. Replacement coming today.
I think you misread OP. "Single point of failure" doesn't mean the only failure modes are hardware failures. It means that if something happens to your nodes whether it's hardware failure or power outage or someone stumbling on your power/network cable, or even having a single service crashing, this means you have a major outage on your hands.
These types of outages are trivially avoided with a basic understanding of well-architected frameworks, which explicitly address the risk represented by single points of failure.
You're not getting the point. The point is that if you use a single node to host your whole web app, you are creating a system where many failure modes, which otherwise could not even be an issue, can easily trigger high-severity outages.
> and even if, you could just run a provisioned secondary server (...)
Congratulations, you are no longer using "one big server", thus defeating the whole purpose behind this approach and learning the lesson that everyone doing cloud engineering work is already well aware.
References to "elastic Kubernetes whatever" is a red herring. You can have a dead simple load balancer spreading traffic across multiple bare metal nodes.
Either way, stuff happens, figuring out what your actual requirements around uptime, time to response, and time to resolution is important before you build a nine nines solution when eight eights is sufficient. :p
Are you serious? Have you ever built/operated/wired rack scale equipment? You think the power cables for your "short" server (vs the longer one being put in) are just hanging out in the back of the rack?
Rack wiring has been done and done correctly for ages. Power cables on one side (if possible), data and other cables on the other side. These are all routed vertically and horizontally, so they land only on YOUR server.
You could put a Mercedes Maybach above/below your server and nothing would happen.
We were their largest customer and they seemed honest even when they made mistakes that seemed silly, so we rolled our eyes and moved on with life.
Managed hosting means accepting that you can't inspect the racks and chide people for not cabling to your satisfaction. And mistakes by the managed host will impact your availability.
The number of production incidents on our corporate mishmash of lambda, ecs, rds, fargate, ec2, eks etc? It’s a good week when something doesn’t go wrong. Somehow the logging setup is better on the personal stuff too.
I'm not a better engineer, I just have drastically fewer failure modes.
Today’s systems don’t fail nearly as often if you use high quality stuff and don’t beat the absolute hell out of SSD. Another trick is to overprovision SSD to allow wear leveling to work better and reduce overall write load.
Do that and a typical box will run years and years with no issues.
Sigh.
Is that more, less than or about the same as having an AWS/Azure/GCP consultant?
What's the difference in labour per hour?
> the risk of having such single point of failure.
At the prices they charge I can have two hot failovers in two other datacenter and still come out ahead.
The problem with one big server is that few customers have ONE (1) app that needs that much capacity. They have many small apps that add up to that much capacity, but that's a very different scenario with different problems and solutions.
For example, one of the big servers I'm in the process of teasing apart has about 100 distinct code bases deployed to it, written by dozens of developers over decades.
If any one of those apps gets hacked and this is escalated to a server takeover, the other 99 apps get hacked too. Some of those apps deal with PII or transfer money!
Because a single big server uses a single shared IP address for outbound comms[1] this means that the firewall rules for 100 apps end up looking like "ALLOW: ANY -> ANY" for two dozen protocols.
Because upgrading anything system-wide on the One Big Server is a massive Big Bang Change, nobody has had the bravery to put their hand up and volunteer for this task. Hence it has been kept alive running 13 year old platform components because 2 or 3 of the 100 apps might need some of those components... but nobody knows which two or three apps those are, because testing this is also big-bang and would need all 100 apps tested all at once.
It actually turned out that even Two Big (old) Servers in a HA pair aren't quite enough to run all of the apps so they're being migrated to newer and better Azure VMs.
During the interim migration phase instead of Two Big Server s there are Four Big Servers... in PRD. And then four more in TST, etc... Each time a SysOps person deploys a new server somewhere, they have to go tell each of the dozens of developers where they need to deploy their apps today.
Don't think DevOps automation will rescue you from this problem! For example in Azure DevOps those 100 apps have 100 projects. Each project has 3 environments (=300 total) and each of those would need a DevOps Agent VM link to the 2x VMs = 600 VM registrations to keep up to date. These also expire every 6 months!
Kubernetes, Azure App Service, AWS App Runner, and GCP App Engine serve a purpose: They solve these problems.
They provide developers with a single stable "place" to dump their code even if the underlying compute is scaled, rebuilt, or upgraded.
They isolate tiny little apps but also allow the compute to be shared for efficient hosting.
They provide per-app networking and firewall rules.
Etc...
[1] It's easy to bind distinct ingress IP addresses on even a single NIC (or multipe), but it's weirdly difficult to split the outbound path. Maybe this is easier on Linux, but on Windows and IIS it is essentially impossible.
> 100 distinct code bases deployed to it
I've worked in a company, where the owner would spend money on anything except hosting. Admin guy would end up deploying a new app on whatever VPS that had the most RAM free at that time.
Ironically, consolidating this mess to "one big server", which was my ungrateful job for many months, fixed many issues. Though, it was done by slicing the host into tiny KVM virtual machines.
That's my other option: a bunch of Azure VM Scale Sets using the tiniest size that will run Windows Server, such as B2as_v2. A handful of closely related apps on each so that firewall rules can be restricted to something sane. Shared Azure Files for the actual app deployments so that devs never need to know the VM names. However, this feels an awful lot like reinventing Kubernetes... but worse.
My job would be sooo much simpler if Microsoft just got off their high horse and supported their own Active Directory in App Service instead of pretending it no longer exists.
A while back I looked into renting hardware, and found that we would save about 20% compared to what we actually paid AWS – in partially because location and RAM requirements made the rental more expensive than anticipated, and partially because we were paying a lot less than on-demand price for AWS.
20% is still significant, but it's a lot less than the ~80% that this and other articles suggest.
The biggest cost with AWS also isn't compute, but egress - for bandwidth heavy setups you can sometimes finance the entirety of the servers from a fraction of the savings in egress.
I cost optimize setups with guaranteed caps at a proportion of savings a lot of the time, and I've yet to see a setup where we couldn't cut the cost far more than that.
Our workloads are primarily memory-bound, and AWS offers pretty good options there, e.g. x2gd instances have 16gb RAM/cpu, while most rental options we found were much more CPU focused (and charged for it.)
Out of curiosity have you benchmarked it? I find that AWS "vCPUs" are significantly slower than a core (or even hyperthread) of a real CPU, and this constrains memory bandwidth too. A single bare-metal can often replace many EC2s.
Another thing to consider is the easy access of persistent NVME drives, something not possible on AWS. Yes you still need backups, but ideally you will only need those backups once a year or less. I've dealt with extremely complex and expensive solutions on AWS that could be trivially solved by just one persistent machine with NVME drives (+ a spare for redundancy). Having the data there persistently (at a cheap price per GB) means you avoid having to shuffle data around or can precompute derived data to speed up lookups at runtime.
If you're actually serious about exploring options to move your infra to bare-metal or hybrid feel free to reach out for a no-obligations call; email in my profile. It seems like you've already optimized it quite well so I'd be curious to see if there is still room for improvement. (Or if you don’t mind, share what your stack is and let others chip in too!)
This seems fundamentally incorrect to me? If I need 100 units of peak compute during 8 hours of work hours, I get that from Big Cloud, and they have two other clients needing same in offset timezones then in theory the aggregate cost of that is 1/3rd of everyone buying their own peak needs.
Whether big cloud passes on that saving is another matter, but it's there.
i.e. big cloud throws enough small customers together so that they don't have "peak" per se just a pretty noisy average load that is in aggregate mostly stable
Indeed but the headroom the cloud needs overall is less than every customers individual worst case scenarios added up. They’d take a percentage of that total because statistically a situation where 100% of customers are at 100% of their peak at 100% same point in time is improbable
Must admit little surprised this logic isn’t self evident
You really need to tweak the TCP/IP stack, buffer sizes, and various other things to get everything to work really well under heavy load. I'm not sure if the various sites that used to talk about this have been updated in the last decade or so, because I don't do that anymore.
I mean, you'll run out of file descriptors pretty quickly if you try to handle a few hundred simultaneous connections. Doesn't matter how big your box is at that point.
I tell my colleagues: it's a good thing that hardware sucks: the harder it is to run bare metal, the happier our customers are that they choose the cloud. :)
(But also: this is an excellent article, full of excellent facts. Luckily, my customers choose differently.)
But is he right? How do we know? Well for starters, look at his CV. He has never managed servers for a living. The closest he's come is working on FPGAs. So what's he basing all these opinions on? Musings? Thoughts? Feelings? Hope?
He makes a couple claims which it isn't obvious are bunk, so I'll address them here, in reverse order.
"microservice architectures in general add a lot of overhead to a system for dubious gain when you are running on one big server" - Microservices architectures are not about overhead or efficiency. They are an attempt to use good software design principles to address Conway's Law. If you design the microservice correctly, you can enable many different groups in an organization to develop software independently, and come up with a highly effective and flexible organization and stable products. Proof? Amazon. But the caveat is, you have to design them correctly. Almost everyone fails at this.
"It's impossible to get the benefits of a CDN, both in latency improvements and bandwidth savings, with one big server" - This is so dumb I'm not sure I have to refute it? But, uh, no, CDNs absolutely give a heap of benefits whether you have 1 server or 1,000. And CloudFlare Free Plan is Free.
"My Workload is Really Bursty - Cloud away." - Unless your workload involves massive amounts of storage or ingress/egress and your profit margin tiny, in which case you may save more by building out a small fleet of unreliable poorly-maintained colocated servers (emphasis on may).
"The "high availability" architectures you get from using cloudy constructs and microservices just about make up for the fragility they add due to complexity. ... Remember that we are trying to prevent correlated failures. Cloud datacenters have a lot of parts that can fail in correlated ways. Hosting providers have many fewer of these parts. Similarly, complex cloud services, like managed databases, have more failure modes than simple ones (VMs)." - Argument from laziness, or ignorance? He's trying to say that because something is complex it's also less reliable. Which completely ignores the reliability engineering aspect of that complexity. You mitigate higher numbers of failure modes by designing the system to fail over reliably. And you also have warm bodies running around replacing the failing parts, which fights entropy. You don't get that in a single server; once your power supply, disk, motherboard, network interface, RAM, etc fails, and assuming your server has a redundant pair, you have a ticking clock to repair it until the redundant pair fails. How lucky do you feel? (oh, and you'll need downtime to repair it.)
As usual, the cloud costs quoted is MSRP, and if you're paying retail, you're a fool. Almost all cloud costs can be brought down from 25%-75%, spot instances are a fraction of the on-demand server cost, and efficient use of cheaper cloud services reduces your need to buy compute at all.
"The big drawback of using a single big server is availability. Your server is going to need downtime, and it is going to break. Running a primary and a backup server is usually enough, keeping them in different datacenters. A 2x2 configuration should appease the truly paranoid: two servers in a primary datacenter (or cloud provider) and two servers in a backup datacenter will give you a lot of redundancy. If you want a third backup deployment, you can often make that smaller than your primary and secondary." - Wait... so One Big Server isn't enough? Huh. So this was a clickbait article? I'm shocked!
"One Server (Plus a Backup) is Usually Plenty" - Plenty for what? I mean we haven't even talked system architecture or application design. But let's assume it's a single microservice that gets 1RPS. Is your backup server a hot spare, cold spare, or live mirror? If it's live, it's experiencing the same wear, meaning it will fail at about the same time. If it's hot, there's less wear, but it's still experiencing some. If it's cold, you get less wear, but you're less sure it'll boot up again. And then there's system configuration. The author mentions the "complexity" of managing a cluster, but actually it's less complex than managing just two servers. With a fleet of servers, you know you have to use automation, so you spend the time to automate their setup and run updates frequently. With a backup, you probably won't do any maintenance on the backup, and you definitely won't perform the same operations on the backup as the server. So the system state will drift wildly, and the backup's software will be useless. It would be better to just have it as spare part.
The author never talks about the true failure modes of "one big server". When parts start to need replacing, it's never cheap. Smart hands cost, cost of the parts+shipping, cost of the downtime. And often you'll find there are delays - delays in getting smart hands to actually repair it correctly, delays in shipping, delays in part ordering/availability. Running out of power, running out of space, temperatures too high, "flaky" parts you can't diagnose, backups and restores, datacenter issues, routing issues, backbone issues. You'll tell yourself these are "probably rare" - but these are all failure modes, and as the author tells us, you should be wary of lots of failure modes. And anecdotes will tell you somebody has run a server for 10 years with no issue, while another person had a server with 3 faults in a month. To say nothing of the need to run "burn-in" on a new server to discover faults once it's racked.
Go ahead and do whatever you want. Cloud, colo, one server, multiple. There will be failures and complexity no matter what. You want to tell yourself a comforting story that there is "one piece of advice" to follow, some black and white world where only one piece of folksy wisdom applies. But here's my folksy wisdom: design your application, design your system to fit it, try not to pinch every penny, build something, and become educated enough to know what problems to expect and how to deal with them. Or if not, pay someone who can, and listen to them.
While it is a useful advice to some people in certain conditions, it should be taken with a grain of salt.
We had a few Dell servers that ran great for a year or two. We rebooted one for some reason or another and it refused to POST due to an EEC failure.
Hauled down to the colo at 3AM and ripped the fucking ram out of the box and hoped it would restart.
Hardware fails. The RAM was fine for years, but something happened to it. Even Dell had no idea and just shipped us another stick, which we stuck in at the next downtime window.
To top it off, we dropped the failing RAM into another box at the office and it worked fine. <shrug>.
[1] https://www.researchgate.net/figure/Failure-rates-of-differe...
Eh, sort of. The difference is that the cloud can go find other workloads to fill the trough from off peak load. They won’t pay as much as peak load does, but it helps offset the cost of maintaining peak capacity. Your personal big server likely can’t find paying workloads for your troughs.
I also have recently come to the opposite conclusion for my personal home setup. I run a number of services on my home network (media streaming, email, a few personal websites and games I have written, my frigate NVR, etc). I had been thinking about building out a big server for expansion, but after looking into the costs I bought 3 mini pcs instead. They are remarkably powerful for their cost and size, and I am able to spread them around my house to minimize footprint and heat. I just added them all to my home Kubernetes cluster, and now I have capacity and the ability to take nodes down for maintenance and updates. I don’t have to worry about hardware failures as much. I don’t have a giant server heating up one part of my house.
It has been great.
For enterprise environments, however, there is much more to consider. One of the biggest costs you face is your operations team. If you go with Hetzner, you essentially have to rebuild a wide range of infrastructure components yourself (WAF, globally distributed CDN, EFS, RDS, EKS, Transit Gateways, Direct Connect and more).
Of course, you can create your own solutions for all of these. At my company, a mid-size enterprise, we once tried to do exactly that.
WAF: https://github.com/TecharoHQ/anubis
CDN: Hetzner Nodes with Cache in Finnland, USA and GER
RDS: Self-hosted MySQL from Bitnami
EFS: https://github.com/rook/rook
EKS: https://github.com/vitobotta/hetzner-k3s
and 20+ more moving targets of infra software stack and support systems
The result was hiring more than 10 freelancers in addition to 5 of our DevOps engineers to build it all and handling the complexity of such a setup and the keep everything up-to-date, spending hundreds of thousands of dollars. Meanwhile, our AWS team, consisting of only three people working with Terraform, proved far more cost-effective. Not in terms of dollars per CPU core, but in terms of average per project spending dollars once staff costs and everything were included.
I think many of the HN posts that say things like "I saved 90% of my infra bill by moving from AWS to a single Hetzner server" are a bit misleading.
For example, if you serve your assets from the server you can skip a cors round trip. If you use an embedded database like sqlite you can shave off 50ms, use dedicated CPU (another 50ms), now you don't need to sever anything from the edge. Because your global latency is much better.
Managing a single VPS is trivial compared to AWS.