Show HN: We moved from AWS to Hetzner, saved 90%, kept ISO 27001 with Ansible
We rebuilt key AWS features ourselves using Terraform for VPS provisioning, and Ansible for everything from hardening (auditd, ufw, SSH policies) to rolling deployments (with Cloudflare integration). Our Prometheus + Alertmanager + Blackbox setup monitors infra, apps, and SSL expiry, with ISO 27001-aligned alerts. Loki + Grafana Agent handle logs to S3-compatible object storage.
The stack includes: • Ansible roles for PostgreSQL (with automated s3cmd backups + Prometheus metrics) • Hardening tasks (auditd rules, ufw, SSH lockdown, chrony for clock sync) • Rolling web app deploys with rollback + Cloudflare draining • Full monitoring with Prometheus, Alertmanager, Grafana Agent, Loki, and exporters • TLS automation via Certbot in Docker + Ansible
I wrote up the architecture, challenges, and lessons learned: https://medium.com/@accounts_73078/goodbye-aws-how-we-kept-i...
I’m happy to share insights, diagrams, or snippets if people are interested — or answer questions on pitfalls, compliance, or cost modeling.
At what cost? People usually exclude the cost of DIY style hosting. Which usually is the most expensive part. Providing 24x7 support for the stuff that you've home grown alone is probably going to make large dent into any savings you got by not outsourcing that to amazon.
> $24,000 annual bill felt disproportionate
That's around 1-2 months of time for a decent devops freelancer. If you underpay your devs, about 1/3rd of an FTE per year. And you are not going to get 24x7 support with such a budget.
This still could make sense. But you aren't telling the full story here. And I bet it's a lot less glamorous when you factor in development time for this.
Don't get me wrong; I'm actually considering making a similar move but more for business reasons (some of our German customers really don't like US hosting companies) than for cost savings. But this will raise cost and hassle for us and I probably will need some re-enforcements on my team. As the CTO, my time is a very scarce commodity. So, the absolute worst use of my time would be doing this myself. My focus should be making our company and product better. Your techstack is fine. Been there done that. IMHO Terraform is overkill for small setups like this; fits solidly in the YAGNI category. But I like Ansible.
I don’t understand why people keep propagating this myth which is mostly pushed by the marketing department of Azure, AWS and GCP.
The truth is cloud provider doesn’t actually provide 24/7 support to your app. They only ensure that their infrastructure is mostly running for a very loose definition of 24/7.
You still need an expert on board to ensure you are using them correctly and are not going to be billed a ton of money. You still need people to ensure that your integration with them doesn’t break on you and that’s the part which contains your logic and is more likely to break anyway.
The idea that your cloud bill is your TCO is a complete fabrication and that’s despite said bill often being extremely costly for what it is.
There will be a new AWS European Sovereign Cloud[1] with the goal of being completely US independent and 100% compliant with EU law and regulations.
[1]: https://www.aboutamazon.eu/news/aws/aws-plans-to-invest-7-8-...
The idea that anything branded AWS can possibly be US independent when push comes to shove is of course pure fantasy.
The US clearly state that extraterritoriality is fine with them. Depending on the company, one gag order is enough to sabotage a whole company.
The ICC move by MS made hospitals go in an even higher gear to prepare off-ramp plans. From private Azure cloud to “let’s get out”
>> That's around 1-2 months of time for a decent devops freelancer. If you underpay your devs, about 1/3rd of an FTE per year. And you are not going to get 24x7 support with such a budget.
In terms of absolute savings, we’re talking about 90% of 24k, that’s about 21.6k saved per year. A good amount, but you cannot hire an SRE/DevOps Engineer for that price; even in Europe, such engineers are paid north of 70k per year.
I personally think the TCO (total cost of ownership) will be higher in the long run, because now every little bit of the software stack has to be managed by their infra team/person, and things are getting more and more complex over time, with updates and breaking changes to come. But I wish them well.
Out of experience, in the long run, this "managed aws saved us because we didn't need people" feels always like the typical argument made by saas sales people. In reality, many services/saas are really expensive, and you probably will only need a few features which sometimes you can rollout yourself.
The initial investment might be higher, but in the long run I think it's worth it. It's a lot like Heroku vs AWS. Superexpensive, but it allows you with little knowledge to push a POC in production. In this case, it's AWS vs self hosted or whatever.
Finally, can we quantify the cost of data/information? This company seems to be really "using" this strategy (= everything home made, you're safe with us) for sales purposes. And it might work, although for the final consumer this might have a higher price, which finally pays the additional devops to maintain the system. So who cares?
How important is for companies to not be subject to CLOUD act or funny stuff like that?
Unless by Europe you mean the Apple feature availability special of UK/Germany/France/Spain/Italy
• We heavily invested upfront in infrastructure-as-code (Terraform + Ansible) so that infra is deterministic, repeatable, and self-healing where possible (e.g. auto-provisioning, automated backup/restore, rolling updates).
• Monitoring + alerting (Prometheus + Alertmanager) means we don’t need to watch screens — we get woken up only if there’s truly a critical issue.
• We don’t try to match AWS’s service level (e.g. RTO of minutes for every scenario) — we sized our setup to our risk profile and customers’ SLAs.
> True cost comparison:
• The migration was done as part of my CTO role, so no external consulting costs. The time investment paid back within months because the ongoing cost to operate the infra is low (we’re not constantly firefighting).
• I agree that if you had to hire more people just to manage this, it could negate the savings. That’s why for some teams, AWS is still a better fit.
> Business vs. cost drivers: Honestly, our primary driver was sovereignty and compliance — cost savings just made the business case easier to sell internally. Like you, our European customers were increasingly skeptical of US cloud providers, so this aligned with both compliance and go-to-market.
> Terraform / YAGNI: Fair point! Terraform probably is more than we need for the current scale. I went with it partly because it fits our team’s skillset and lets us keep options open as we grow (multi-provider, DR regions, etc).
And, finally, because this, I am posting about it. I am sharing as much as I can, and just spread the work about it. I just sharing my experience and knowledge. If you have any questions or want to discuss further, feel free to reach out at jk@datapult.dk!
Two reasons for this stick out:
- Are the multi-million dollar SV seed rounds distorting what real business costs are? Counting dev salaries etc. (if there is at least one employee) it doesn't seem worth the effort to save $20k - i.e., 1/5 of a dev salary? But for a bootstrapped business $20k could definitely be existential.
- The important number would be the savings as percent of net revenue. Is the business suddenly 50% more profitable? Then it's definitely worth it. But in terms of thinking about positively growing ARR doing cost/benefit on dropping AWS vs. building a new (profitable) feature I could see why it might not make sense.
Edit to add: it's easy to offhand say "oh yeah easy, just get to $2M ARR instead of saving $20k- not a big deal" but of course in the real world it's not so simple and $20k is $20k. The prevalent SV mindset of just spending without thinking too much about profitability is totally delusional except for like 1 out of 10000 startups.
Given these existence of these tools, which are fantastic, I'm often stunned at how sluggish, expensive and how lacklustre the UX is of the AWS monitoring stack.
Monitoring quickly became the most expensive, and most unpleasant part of our AWS experience.
However in the US it's not very relevant or even interesting to companies, and some European companies fail to understand that.
SOC 2 is the default and the preferred standard in the US - it's more domestic and less rigid than ISO 27001.
checking for evidence that you are doing those things I would call ridgit. SOC2 as attestation doesn’t require so much documentation.
Also, Loki! How do you handle memory hunger on loki reader for those pesky long range queries, and are there alternatives?
Failures/upgrades: We provision with Terraform, so spinning up replacements or adding capacity is fast and deterministic.
We monitor hardware metrics via Prometheus and node exporter to get early warnings. So far (9 months in) no hardware failure, but it’s a risk we offset through this automation + design.
Apps are mostly data-less and we have (frequently tested) disaster recovery for the database.
Loki: We’re handling the memory hunger by
• Distinguishing retention limits and index retention
• Tuning query concurrency and max memory usage via Loki'’'s config + systemd resource limits.
• Use Promtail-style labels + structured logging so queries can filter early rather than regex the whole log content.
• Where we need true deep history search, we offload to object store access tools or simple grep of backups — we treat Loki as operational logs + nearline, not as an archive search engine.
We used AWS EKS in the old days and we never liked the extreme complexity of it.
With two Spring Boot apps, a database and Redis running across Ubuntu servers, we found simpler tools to distribute and scale workloads.
Since compute is dirt cheap, we over-provision and sleep well.
We have live alerts and quarterly reviews (just looking at a dashboard!) to assess if we balance things well.
K8s on EKS was not pleasant, I wanna make sure I never learn how much worse it can get across European VPS providers.
One of the advantages of more expensive providers seems to be that they have good reputation due to a de facto PoW mechanism.
The only potential indirect risks is if your Hetzner VPS IP range gets blacklisted (because some Hetzner clients abuse it for Sybil attacks or spam).
Or if Hetzner infrastructure was heavily abused, their upstream or internal networking could (in theory) experience congestion or IP reputation problems — but this is very unlikely to affect your individual VPS performance.
This depends on what you are doing on Hetzner and how you restrict access but for an ISO-27001 certified enterprise app, I believe this is extremely unlikely.
Most of our customers have a hard requirement on ISO 9001. Many on ISO 27001, too. The rest strongly prefers a partner having a plan to get ISO 27001
And also lacking a bit in details:
- both technical (e.g. how are you dealing with upgrades or multi-data center fallback for your postgresql), and
- especially business, e.g. what's the total cost analysis including the supplemental labor cost to set this up but mostly to maintain it.
Maybe if you shared your scripts and your full cost analysis, that would be quite interesting.
It is a great big cloud play to make enterprises reliant on the competency in their weird service abstractions, which is slowly draining the quite simple ops story an enterprise usually needs.
Might throw together a post on it eventually:
https://news.ycombinator.com/context?id=43216847
Everyone talks about it but none wants to be the first mover.
There's also a lot of FUD regarding hiring more staff, my observed experience is that hyperscalers need an equivalent number of people on rotation- it's just different skills (learning the intricacies/quirks of different product offerings on the hyperscaler vs CS/Operational fundamentals) - so everyone is scared to overload their teams with work and potentially need to hire people -- you can couple this with the fact that all migrations are up-front expensive and change is bad by default.
There will come a day where there simply isn't enough money to spend 10x the cost on these systems. It will be a sad day for everyone because salaries will be depressed too, and we will opine the days of shiny tools where we could make lots of work disappear by saying that our time is too expensive to work with such peasant issues.
If I manage to get https://uncloud.run/ or something similar up & running, the platform will no longer matter, whether it's OVH, Hetzner, Azure, AWS, GCP, ... It should all be possible & easy to switch... #FamousLastWords
The Medium post is mostly fluff and a lead generator.
I’m happy to share specific configs, diagrams, or lessons learned here on HN if people want — and actually I’m finding this thread a much better forum for that kind of deep dive.
I'll dive into other aspects elsewhere: You can't doubt that given what I am sharing here.
Any particular area you’d like me to expand on? (e.g. how we structured Terraform modules, Ansible hardening, Prometheus alerting, Loki tuning?)
Have you looked into others as well, like IONOS and Scaleway?
Scaleway came up but is more expensive. IONOS did not come up in our research.
Part of what we tried to do was to make ourselves independent from traditional cloud services and be really good at doing stuff on a VPS. Once you start doing that, you can actually allow yourself to look more at uptimes and at costs. Also, since we wanted everything to be fully automated, Terraform support was important for us, and OVHcloud and Hetzner had that.
I'm sure there's many great cloud providers out in Europe, but it's hard to vet them to understand if they can meet demand and if they are financially stable. We would want not to keep switching cloud providers. So picking two of the major ones seemed like a safe choice.
I don't remember a single such case. I remember reading a lot of speculations like "it's highly likely that it was done by Russians" every single time without a trace of evidence.
It's undeniable that core European infrastructure is targeted currently
Personally I think the amount of special pleading required to imagine that it is _not_ Russia is a bit much (particularly around the deep sea cable cuts; at that point you’re really claiming that Russia is deniably pretending that it is them, but really it’s someone else), but you do you. It doesn’t change the overarching point; both Hetzner and OVH would be obvious targets for, ah, whoever it is.
* - https://news.ycombinator.com/showhn.html
Once I was working in a quite small company (around 100 employees) that hosted everything on AWS. Due to high bills (it's a small company that resided in Asia) and other problems, I migrated everything to DigitalOcean (we still used AWS for things like SES), and the monthly bill for hosting became like 10 times lower. With no other consequences (in other words, it haven't become less reliable).
I still wonder who calculated that AWS is cheaper than everything else. It's definitely one of the most expensive providers.