This is an advert, trying to sell you something. The only bits of information that could be tangentially interesting is "local drives outperform network attached storage", or, "aws can get really expensive". But none of these things are particularly novel findings.
9dev · 1d ago
Still makes for compelling reasons to choose a different architecture for new database deployments than you used to; datacenter offerings from the likes of Hetzner can be a vastly superior choice over cloud providers for IO-intensive workloads like database servers.
ozgune · 1d ago
Question to author.
Are you planning to publish CH benchmarks (TPC-C and TPC-H combined)? I'd expect Aurora to perform much worse on CH than on TPC-C/H. That's because Aurora pushes the WAL logs to replicated shared storage. Since you only need quorum on a write, you get a fast ack on the write (TPC-C). The way you've run TPC-H doesn't modify the data that much, so you also get baseline Postgres performance.
However, when you're pushing writes and you have a sequential scan over the data, then Aurora needs to reconcile the WAL writes, manage locks, etc. CH benchmark exercises that path and I'd expect it to notably slow down Aurora.
(Disclaimer: ex-Citus and current Ubicloud founder)
pwmtr · 1d ago
Yes, that is correct. That said, in our tests we only saw 2x improvements in CH benchmarks. However, we found out that it was due to an architectural issue in our VM I/O path and how we virtualize the storage. Based on our estimations we should see ~5x difference but for that we need to revamp our storage virtualization first.
We have plans for publishing a CH benchmark results on a follow up blog post. However, we didn't want to do that for now to not put misleading results.
speedbird · 1d ago
Well this is quite a simplification and also happens to miss out a while swathe of history.
There’s a massive it depends, not all about basic performance.
hanikesn · 1d ago
Yeah storage appliances predate public clouds fo at least a decade. Not even talking about mainframes etc.
kakoni · 1d ago
> standard-8 instance (comes with 8 vCPU, 32GB RAM and local NVMe SSD)
So were they using the GD instances? (With XXgd instances, local NVMe-based SSDs are physically connected to the host server..) or something else?
SonOfLilit · 1d ago
There's this amazing gif at the top that shows local storage running circles around network storage, and then the benchmarks are... x1.5-x4? I feel like the graphic is very misleading here...
regularfry · 1d ago
What surprised me here is how close Aurora got. That's some magic right there.
I've always held that if you want resilience, you just cannot rely on local storage. No matter how many times you've got data replicated locally, you're still at risk of the whole machine failing - best case falling off the network, worst case trashing all its disks in some weird failure state as the RAID firmware decides today is the day to Just Not. And while you might technically still be able to recover the data, you're still offline.
You just need your data to be off the machine already when that happens. Not to say that all access needs to go over the network - local caching ought to go a long way here - but the default should be to switch to another machine and recycle the failed one.
Relevant to the article, this is independent of the speed and reliability of the actual hardware. It was true in 2010, it's true now.
fmajid · 1d ago
This is supposed to be news? Databases moved to SSDs 15 years ago, but it’s true the high-latency SAN mindset lingers in its reincarnation as EBS.
panrobo · 1d ago
there is a middle ground where you can get NVMe-optimized networked storage with solutions like simplyblock. It gives you local-like performance but separates compute from storage and gives more options for backup & DR.
kevincox · 1d ago
Honestly the takeaway I get from this is that Auora is pretty incredible (possible consistency bugs aside) in that it gets nearly the performance of local disk without the risk of data loss and downtime due to hardware failure (although the more complex architecture can of course cause data loss and failure due to other problems).
jbverschoor · 1d ago
Please add [Ad] to the title
lionls · 1d ago
no network layer for local drives, therefore faster than remote drives
fake-name · 1d ago
* if you're doing cloud stuff.
Otherwise, continue as normal.
It turns out that local drives continue to be faster then remote drives. Who would have thought?
kkfx · 1d ago
I agree but... NVME actually is also network attachable!
Beside that IMVHO the future of cloud, meaning the modern mainframe, is the cluster, or decentralized applications run from homes and sheds with p.v. and local storage, the desktop at the center. Because we can't live with such centralization, we can't evolve and we can't even be democracies with such model where information in in the hands of very few at such level of details. With FTTH, p.v., energy storage, IT development we could came back to the original interconnected desktop model sparing resources instead of consuming more.
Are you planning to publish CH benchmarks (TPC-C and TPC-H combined)? I'd expect Aurora to perform much worse on CH than on TPC-C/H. That's because Aurora pushes the WAL logs to replicated shared storage. Since you only need quorum on a write, you get a fast ack on the write (TPC-C). The way you've run TPC-H doesn't modify the data that much, so you also get baseline Postgres performance.
However, when you're pushing writes and you have a sequential scan over the data, then Aurora needs to reconcile the WAL writes, manage locks, etc. CH benchmark exercises that path and I'd expect it to notably slow down Aurora.
(Disclaimer: ex-Citus and current Ubicloud founder)
We have plans for publishing a CH benchmark results on a follow up blog post. However, we didn't want to do that for now to not put misleading results.
There’s a massive it depends, not all about basic performance.
So were they using the GD instances? (With XXgd instances, local NVMe-based SSDs are physically connected to the host server..) or something else?
I've always held that if you want resilience, you just cannot rely on local storage. No matter how many times you've got data replicated locally, you're still at risk of the whole machine failing - best case falling off the network, worst case trashing all its disks in some weird failure state as the RAID firmware decides today is the day to Just Not. And while you might technically still be able to recover the data, you're still offline.
You just need your data to be off the machine already when that happens. Not to say that all access needs to go over the network - local caching ought to go a long way here - but the default should be to switch to another machine and recycle the failed one.
Relevant to the article, this is independent of the speed and reliability of the actual hardware. It was true in 2010, it's true now.
Otherwise, continue as normal.
It turns out that local drives continue to be faster then remote drives. Who would have thought?
Beside that IMVHO the future of cloud, meaning the modern mainframe, is the cluster, or decentralized applications run from homes and sheds with p.v. and local storage, the desktop at the center. Because we can't live with such centralization, we can't evolve and we can't even be democracies with such model where information in in the hands of very few at such level of details. With FTTH, p.v., energy storage, IT development we could came back to the original interconnected desktop model sparing resources instead of consuming more.