Tell HN: The Hetzner Experience - Invisible Outages
Yesterday, we had a particularly stressful incident involving Hetzner load balancers in Falkenstein. Our Kubernetes control planes were unreachable due to load balancer targets showing as unhealthy. We quickly worked around the issue by deploying an identical load balancer configuration in another region. Despite explicitly instructing Hetzner support not to recreate our resources (since they're managed via Terraform), they manually recreated the load balancer anyway, causing momentary panic - though thankfully our Terraform state wasn't impacted.
We pay nearly €20,000 per month for Hetzner's services, yet they refuse to offer a direct support hotline, even if we were willing to pay extra for it. What's especially troubling is their persistent silence on these outages. Hetzner's status page showed no signs of this incident, neither during nor after. This pattern makes us question the transparency and purpose of the status page itself.
Have any of you experienced similar invisible outages with Hetzner?
Our experience has been excellent, but we also design for the platform. Redundant dedicated networking, multi-AZ, networking failover, RAID, k8s, Mayastor, etc.
The worst issues we see are occasional scheduled outages of an upstream router. This will take out an AZ for external traffic for about 20 mins, but the internal dedicated network will ensure internal services all stay up and quorate.
It’s not cheap, but it’s still cheaper than AWS.
I think their dedicated offering is probably more stable as it has been around longer, and it is also much much simpler. They need to provide networking, power, and finance the hardware. All of which is very much solved problems.
(We’re https://lithus.eu, if anyone is interested. You can contact me at adam@…. I’m on holiday this week, but back next week)
You also needed to consider that when this happened, the data inside the machine was mostly lost. Finally, you also needed to plan to graduate out of it as soon as you had enough money to go either to a colocated data center or the "real cloud".
I kept Hetzner as a backup provider in more than one company, mainly to have real machines for take home tests, back when hiring was plentiful. Even so, we often faced problems with the machines going down due to hardware or networking issues, and the need to rebuild them from the ground up. Those mirrored all tales of woe everyone in the department had from years of working with Hetzner, sometimes losing production data because the rules of the game were not followed.
So it seems that 6 years later their scale has increased but the experience remains the same. On the bright side, kudos to Hetzner for teaching waves of engineers about reliability and disaster recovery during all these years.
Most of the issues we've faced are related to their Storage Boxes, multiple incidents where they were completely unavailable, sometimes for hours or even days. What’s frustrating is that these outages are never reflected on their status page, so you're left in the dark unless you open a ticket yourself. Even then, the only explanation we usually get is that the specific Storage Box is under "heavy load", and the suggested fix is always to migrate to another box. That might be fine for infrequent use, but it's not acceptable when you're relying on it.
To be fair, Hetzner has been a solid provider for many years, and we’re still hoping this is just a temporary rough patch. We really hope they get things back on track soon.
However, I don't run anything mission critical with them as they don't really have a reliable support. Just using their cheap dedis for background tasks.
The server I have right now is stable, but I had different experiences before as well. And their network is unreliable either way, timeouts etc...
Based on many years of experience, all providers are guilty of that. Only large scale outages or ones that just couldnt be ignored are reflected on status page. This doesn't necessarily mean malevolence on provider side - their sensors just may not be good enough to spot the issue.
On larger scale - why would you choose hetzner and then complain about uptime? Its a well know provider with low prices and low reliability. There are tons of businesses who find this model suitable for them. If yours is not one of them - just switch to something more reliable. Granted, your bill will likely be 2x+ of 20k eur, but you get what you whine about.
As old adage says, we can make this project fast, cheap and with amazing quality, but you can choose only 2 options.
Just the typical server outages and a "Fault report" notification Email from Hetzner