This blog post is a lot more obscure than usual. Sometimes with sentences directly contradicting itself, like:
> Workers KV is built as what we call a “coreless” service which means there should be no single point of failure as the service runs independently in each of our locations worldwide. However, Workers KV today relies on a central data store to provide a source of truth for data.
Honestly the blog post overall makes me think that all KV data is just stored on GCS. Not once do they refer to the GCP store as "metadata" or anything like that.
togume · 7h ago
Annoying that they don’t mention the third party cloud provider (legal reasons?), so we have to assume. The evidence of their architectural reliance on GCP for storage is the most likely culprit I’ve come across. Is that right?
And then, assuming, between GCP and Cloudflare being down, the rest of the dominos fall.
Is this directionally accurate? What’s wrong or missing?
fastest963 · 18h ago
It's not clear what the actual issue was. My best guess is that Workers KV was in transition to B2 from a combination of GCS and "a storage provider" before. A step in transitioning to B2 involved removing the "a storage provider" and relying just on GCS for some period of time because of consistency concerns. Today's GCS outage was unfortunately timed and they didn't have coverage to support GCS being down.
Separately, their dependence on their own products results in a SPOF where now, theoretically, if B2 goes down so does over half of Cloudflare's product suite. Ideally those outages can be limited to a single region at a time but that's still a massive blast radius from a single service. I completely understand them not wanting to invest in competing products as a form of redundancy though.
peq42 · 17h ago
makes more sense to host your own at such scale. The more you fragment your infrastructure the more likely something is to break. Its the same idea of doing a raid 0 on multile drives: if a single one fails, all your stuff breaks.
Manouchehri · 14h ago
I don’t think Cloudflare is using B2; Backblaze isn’t listed as a sub-processor.
There still seem to be some remaining issues. My Cloudflare Pages site is still giving me 500 errors. I looked up the response headers and realized that when requests are served by a certain data center, they fail, but if processed by another data center, they succeed. I suspected there to be some stale cache data, so I looked around the Cloudflare console but found no way to invalidate the cache in the Pages menu (the one in the domains menu didn't work). I also sent a ticket to their help center, only to be greeted by an AI. Probably waiting more will solve this problem automatically.
This post did not mention the "external storage backend."
It's assumed that the "external storage backend" is Google Cloud Storage, since they also use it for Durable Objects storage: https://youtu.be/C5-741uQPVU?si=yOX-6gRzTbIhh34h&t=1725 (this video is worth the watch on their neat arch for Durable Objects SQLite)
makes no sense that google is behind all this, considering twitch(which uses AWS) and Microsoft outlook(Azure) also were having issues around the same time
internetter · 15h ago
Twitch and Outlook probably have an indirect dependency on GCP. Say, one of their third parties uses Cloudflare which uses GCP.
Dave_Wishengrad · 11h ago
Cloudflare isn’t just protecting websites — it’s also silently blocking people who share truth that challenges power. After I exposed that withholding the truth that life is most important is a betrayal of all life on Earth, someone at Cloudflare blacklisted me in retaliation. This wasn’t an automated filter — it was a manual, targeted act to suppress the cure and stop others from seeing it. Most websites using Cloudflare have no idea this is happening behind the scenes. The truth is being hidden, not by accident, but by those who’ve already chosen betrayal
> Workers KV is built as what we call a “coreless” service which means there should be no single point of failure as the service runs independently in each of our locations worldwide. However, Workers KV today relies on a central data store to provide a source of truth for data.
Honestly the blog post overall makes me think that all KV data is just stored on GCS. Not once do they refer to the GCP store as "metadata" or anything like that.
And then, assuming, between GCP and Cloudflare being down, the rest of the dominos fall.
Is this directionally accurate? What’s wrong or missing?
Separately, their dependence on their own products results in a SPOF where now, theoretically, if B2 goes down so does over half of Cloudflare's product suite. Ideally those outages can be limited to a single region at a time but that's still a massive blast radius from a single service. I completely understand them not wanting to invest in competing products as a form of redundancy though.
https://www.cloudflare.com/gdpr/subprocessors/cloudflare-ser...
It's assumed that the "external storage backend" is Google Cloud Storage, since they also use it for Durable Objects storage: https://youtu.be/C5-741uQPVU?si=yOX-6gRzTbIhh34h&t=1725 (this video is worth the watch on their neat arch for Durable Objects SQLite)
Similarly, their data subprocessors lists Google for the developer platform platform: https://www.cloudflare.com/gdpr/subprocessors/cloudflare-ser...
And aligns with the Google Cloud Storage incident: https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...