Show HN: We moved from AWS to Hetzner, saved 90%, kept ISO 27001 with Ansible
We rebuilt key AWS features ourselves using Terraform for VPS provisioning, and Ansible for everything from hardening (auditd, ufw, SSH policies) to rolling deployments (with Cloudflare integration). Our Prometheus + Alertmanager + Blackbox setup monitors infra, apps, and SSL expiry, with ISO 27001-aligned alerts. Loki + Grafana Agent handle logs to S3-compatible object storage.
The stack includes: • Ansible roles for PostgreSQL (with automated s3cmd backups + Prometheus metrics) • Hardening tasks (auditd rules, ufw, SSH lockdown, chrony for clock sync) • Rolling web app deploys with rollback + Cloudflare draining • Full monitoring with Prometheus, Alertmanager, Grafana Agent, Loki, and exporters • TLS automation via Certbot in Docker + Ansible
I wrote up the architecture, challenges, and lessons learned: https://medium.com/@accounts_73078/goodbye-aws-how-we-kept-i...
I’m happy to share insights, diagrams, or snippets if people are interested — or answer questions on pitfalls, compliance, or cost modeling.
At what cost? People usually exclude the cost of DIY style hosting. Which usually is the most expensive part. Providing 24x7 support for the stuff that you've home grown alone is probably going to make large dent into any savings you got by not outsourcing that to amazon.
> $24,000 annual bill felt disproportionate
That's around 1-2 months of time for a decent devops freelancer. If you underpay your devs, about 1/3rd of an FTE per year. And you are not going to get 24x7 support with such a budget.
This still could make sense. But you aren't telling the full story here. And I bet it's a lot less glamorous when you factor in development time for this.
Don't get me wrong; I'm actually considering making a similar move but more for business reasons (some of our German customers really don't like US hosting companies) than for cost savings. But this will raise cost and hassle for us and I probably will need some re-enforcements on my team. As the CTO, my time is a very scarce commodity. So, the absolute worst use of my time would be doing this myself. My focus should be making our company and product better. Your techstack is fine. Been there done that. IMHO Terraform is overkill for small setups like this; fits solidly in the YAGNI category. But I like Ansible.
Also, Loki! How do you handle memory hunger on loki reader for those pesky long range queries, and are there alternatives?
Failures/upgrades: We provision with Terraform, so spinning up replacements or adding capacity is fast and deterministic.
We monitor hardware metrics via Prometheus and node exporter to get early warnings. So far (9 months in) no hardware failure, but it’s a risk we offset through this automation + design.
Apps are mostly data-less and we have (frequently tested) disaster recovery for the database.
Loki: We’re handling the memory hunger by
• Distinguishing retention limits and index retention
• Tuning query concurrency and max memory usage via Loki'’'s config + systemd resource limits.
• Use Promtail-style labels + structured logging so queries can filter early rather than regex the whole log content.
• Where we need true deep history search, we offload to object store access tools or simple grep of backups — we treat Loki as operational logs + nearline, not as an archive search engine.
We used AWS EKS in the old days and we never liked the extreme complexity of it.
With two Spring Boot apps, a database and Redis running across Ubuntu servers, we found simpler tools to distribute and scale workloads.
Since compute is dirt cheap, we over-provision and sleep well.
We have live alerts and quarterly reviews (just looking at a dashboard!) to assess if we balance things well.
K8s on EKS was not pleasant, I wanna make sure I never learn how much worse it can get across European VPS providers.
One of the advantages of more expensive providers seems to be that they have good reputation due to a de facto PoW mechanism.
The only potential indirect risks is if your Hetzner VPS IP range gets blacklisted (because some Hetzner clients abuse it for Sybil attacks or spam).
Or if Hetzner infrastructure was heavily abused, their upstream or internal networking could (in theory) experience congestion or IP reputation problems — but this is very unlikely to affect your individual VPS performance.
This depends on what you are doing on Hetzner and how you restrict access but for an ISO-27001 certified enterprise app, I believe this is extremely unlikely.
Have you looked into others as well, like IONOS and Scaleway?
Scaleway came up but is more expensive. IONOS did not come up in our research.
Part of what we tried to do was to make ourselves independent from traditional cloud services and be really good at doing stuff on a VPS. Once you start doing that, you can actually allow yourself to look more at uptimes and at costs. Also, since we wanted everything to be fully automated, Terraform support was important for us, and OVHcloud and Hetzner had that.
I'm sure there's many great cloud providers out in Europe, but it's hard to vet them to understand if they can meet demand and if they are financially stable. We would want not to keep switching cloud providers. So picking two of the major ones seemed like a safe choice.
It is a great big cloud play to make enterprises reliant on the competency in their weird service abstractions, which is slowly draining the quite simple ops story an enterprise usually needs.
Might throw together a post on it eventually:
https://news.ycombinator.com/context?id=43216847
* - https://news.ycombinator.com/showhn.html
The Medium post is mostly fluff and a lead generator.
I’m happy to share specific configs, diagrams, or lessons learned here on HN if people want — and actually I’m finding this thread a much better forum for that kind of deep dive.
I'll dive into other aspects elsewhere: You can't doubt that given what I am sharing here.
Any particular area you’d like me to expand on? (e.g. how we structured Terraform modules, Ansible hardening, Prometheus alerting, Loki tuning?)