Ask HN: GCP Outage?

blitzar · 4h ago

Systems down, heading to the pub.

ge96 · 4h ago

Every time pager duty hits, take a shot

mbf · 4h ago

I forgot all about pager duty... been retired over a year now. I don't miss pager duty.

CoastalCoder · 3h ago

> I forgot all about pager duty

Probably because it's hard to form long-term memories when you're sleep-deprived :/

blitzar · 3h ago

or 8 shots in

dondraper36 · 5h ago

https://status.cloud.google.com/incidents/8cY8jdUpEGGbsSMSQk...

Seems to be some hardware problem at least in us-east1

jbreckmckye · 2h ago

Someone unplugged the Big Router

palcu · 5h ago

There is an external incident now.

https://status.cloud.google.com/incidents/8cY8jdUpEGGbsSMSQk...

Ironlink · 5h ago

Our system in EU observed some slowness and a few 500 and 503 responses from `identitytoolkit.googleapis.com` over a period of about 10 minutes.

freedomben · 3h ago

Definitely been seeing a handful of 50x errors this morning. Fortunately seems like a partial outage but definitely annoying (and can sometimes indicate worse trouble coming)

abhisek · 2h ago

Yes. Many times. Kubernetes upgrade during maintenance schedule borks up entire cluster, yet everything is green on status page. Support case under enterprise support plan took almost 6 hours to get it resolved.

dilyevsky · 1h ago

I see they made great strides in the past 5 years - it used to take days =)

lebski88 · 5h ago

Our VPN restarted about an hour ago and caused a bit of excitement, on the whole it's been a lot less _interesting_ than the last one thankfully.

tosh · 6h ago

Firebase Firestore is either down or very high latency in us-east1

ghxst · 6h ago

Multiple people within our company reporting issues. Mostly from US, us in the EU still seem fine as of right now.

edit: Never mind, it's down for me now as well.

romanzubenko · 6h ago

We first noticed google login issues with our app, can't login with google anywhere now, Google Analytics is down as well.

staletofu · 6h ago

Can't login with google to langsmith at the moment and gcp login is loading either. Seems like there is something afoot.

jeanlucas · 6h ago

Multiple people in Brazil reporting:

- SSO issues;

- Google workspace tools not loading;

current time: 2025-07-18T15:35:43+00:00 12h35 GMT-3

ChrisArchitect · 5h ago

https://www.google.com/appsstatus/dashboard/incidents/oFcAZT...

archiolidius · 6h ago

YouTube API partially down

jeanlucas · 6h ago

Some issues here in Brazil

hu3 · 3h ago

One of my multi-region clients is also affected by Brazil GCP.

dangoodmanUT · 5h ago

Reminder that multi cloud >>> multi region

Anyone who says otherwise is selling availability theater

Too many whole-cloud outages due to a bad config in the last 2 months (GCP x2, cloudflare x2)

18172828286177 · 4h ago

This isn’t a whole-cloud outage. It’s not even a whole-region outage.

Whole-cloud outages are pretty damn rare. The recent GCP issues are an exception to the general rule.

I’d posit that the complexity of a multi-cloud setup is generally going to reduce your service’s reliability more than relying on a single cloud does.

dangoodmanUT · 3h ago

Not about regions, it’s about services

remram · 4h ago

Whole-zone outages are also rare...

FrankPetrilli · 46m ago

"Rarity" is a distinction without merit in this particular case; the important thing to note is that (most) clouds don't guarantee _any_ availability of a single zone. A system which stashes all of its infrastructure in one zone only is expected to be impacted by issues with that cloud, while a multi-zone setup spanning a region is generally "soft-guaranteed" to be resilient to normal operations / failures.

remram · 6m ago

> (most) clouds don't guarantee _any_ availability of a single zone

What?

AWS (EC2) does: https://aws.amazon.com/compute/sla/?did=sla_card&trk=sla_car... so does GCP (GCE): https://cloud.google.com/compute/sla?hl=en and so does OVH: https://us.ovhcloud.com/legal/sla/public-cloud/

Are none of those three part of "most clouds"? What cloud platform do you use?

jonathaneunice · 5h ago

And also that effort(multi cloud) >>> effort(multi region)

JohnMakin · 4h ago

I've maintained a large multi-cloud architecture in the past. The problem is they really hit you hard on egress costs. Of course the motivation is obvious, they want to keep you locked in to their vendor. I did like that it gave a stronger leverage in contract renewals, but that was about it. The IAC was much more complicated and required more people/areas of knowledge. So it's definitely a tradeoff.

You are correct that it's "better" though if your goal is to have as many 9's of uptime as possible.

mads_quist · 4h ago

I currently have the strong opinion that for many mid-sized orgs with 250+ engineers it can be more resilient if you go back to bare metal or at least VM only in two or three local date centers. Yes, you need to know that they do their job well. But it will probably also reduce a lot of devops overhead...

dilyevsky · 3h ago

There are multiple companies that help you with that by running tunnels via Direct Interconnect (Direct Connect in AWS) so that you "only" pay 2c/G egressing data out of VPC via this tunnel

dijit · 4h ago

Maybe centralising all our IT infrastructure wasn't a good idea after all.

kenmacd · 4h ago

I dunno. If just your employers site is down then you'll be expected to fix it, whereas if everyone is down there's less pressure.

dijit · 3h ago

Nobody who talks to actual stakeholders can use this as a defence.

B2B customers don’t care if the other sites are also down, your SLA is affected with them, and they will want compensation.

hadlock · 1h ago

You need to phrase it as Internet Weather.

dpkirchner · 3h ago

yup. I figure I'm basically a free-rider (except I am paying a relatively small amount.)

Ask HN: GCP Outage?

Comments (37)