AWS in 2025: Stuff you think you know that's now wrong

214 keithly 136 8/20/2025, 3:30:07 PM lastweekinaws.com ↗

Comments (136)

simonw · 5h ago

S3: "Block Public Access is now enabled by default on new buckets."

On the one hand, this is obviously the right decision. The number of giant data breeches caused by incorrectly configured S3 buckets is enormous.

But... every year or so I find myself wanting to create an S3 bucket with public read access to I can serve files out of it. And every time I need to do that I find something has changed and my old recipe doesn't work any more and I have to figure it out again from scratch!

sylens · 4h ago

The thing to keep in mind with the "Block Public Access" setting is that is a redundancy built in to save people from making really big mistakes.

Even if you have a terrible and permissive bucket policy or ACLs (legacy but still around) configured for the S3 bucket, if you have Block Public Access turned on - it won't matter. It still won't allow public access to the objects within.

If you turn it off but you have a well scoped and ironclad bucket policy - you're still good! The bucket policy will dictate who, if anyone, has access. Of course, you have to make sure nobody inadvertantly modifies that bucket policy over time, or adds an IAM role with access, or modifies the trust policy for an existing IAM role that has access, and so on.

simonw · 2h ago

I think this is the key of why I find it confusing: I need a very clear diagram showing which rules override which other rules.

andrewmcwatters · 4h ago

This sort of thing drives me nuts in interviews, when people are like, are you familiar with such-and-such technology?

Yeah, what month?

tester756 · 2h ago

If you're aware of changes, then explain that there were changes over time, that's it

andrewmcwatters · 44m ago

You seem to be lacking the experience of what actually happens in interviews.

crinkly · 4h ago

I just stick CloudFront in front of those buckets. You don't need to expose the bucket at all then and can point it at a canonical hostname in your DNS.

hnlmorg · 4h ago

That’s definitely the “correct” way of doing things if you’re writing infra professionally. But I do also get that more casual users might prefer not to incur the additional costs nor complexity of having CloudFront in front. Though at that point, one could reasonably ask if S3 is the right choice for causal users.

damieng · 38m ago

I'd argue putting CloudFront on top of S3 is less complex than getting the permissions and static sharing setup right on S3 itself.

gchamonlive · 3h ago

S3 + cloudfront is also incredibly popular so you can just find recipes for automating that in any technology you want, Terraform, ansible, plain bash scripts, Cloudformation (god forbid)

gigatexal · 3h ago

Yeah holy crap why is cloud formation so terrible?

gchamonlive · 2h ago

It's designed to be a declarative DSL, but then you have to do all sorts of filters and maps in any group of resources and suddenly you are programming in yaml with both hands tied behind your back

gigatexal · 26m ago

Yeah it’s just terrible. If Amazon knew what was good they’d just replace it with almost anything else. Heck just got all in on terraform and call it a day.

SteveNuts · 3h ago

Last time I tried to use CF, the third party IAC tools were faster to release new features than the functionality of CF itself. (Like Terraform would support some S3 bucket feature when creating a bucket, but CF did not).

I'm not sure if that's changed recently, I've stopped using it.

tayo42 · 2h ago

>S3 is the right choice for causal users.

It's so simple for storing and serving a static website.

Are there good and cheap alternatives?

MaKey · 1h ago

Yeah, your classic web hoster. Just today I uploaded a static website to one via FTP.

fodkodrasz · 1h ago

Really? If I remember correctly: My Static website served from S3 + CF + R53 by about 0.67$ / mo, 0.5 being R53 from that, 0.16 being CF, S3 being 0.01 for my page.

BTW: Is GitHub Page still free for custom domains? (I don't know the EULA)

crinkly · 3h ago

It's actually incredibly cheap. I think our software distribution costs, in the account I run, are around $2.00 a month. That's pushing out several thousand MSI packages a day.

oblio · 19m ago

With CloudFront?

herpderperator · 2h ago

For the sake of understanding, can you explain why putting CloudFront in front of the buckets helps?

bhattisatish · 1h ago

Cloudfront allows you to map your S3 with both

- signed url's in case you want a session base files download

- default public files, for e.g. a static site.

You can also map a domain (sub-domain) to Cloudfront with a CNAME record and serve the files via your own domain.

Cloudfront distributions are also CDN based. This way you serve files local to the users location, thus increasing the speed of your site.

For lower to mid range traffic, cloudfront with s3 is cheaper as the network cost of cloudfront is cheaper. But for large network traffic, cloudfront cost can balloon very fast. But in those scenarios S3 costs are prohibitive too!

SOLAR_FIELDS · 5h ago

I honestly don't mind that you have to jump through hurdles to make your bucket publically available and that it's annoying. That to me seems like a feature, not a bug

dghlsakjg · 5h ago

I think the OPs objection is not that hurdles exist but that they move them every time you try and run the track.

simonw · 5h ago

Sure... but last time I needed to jump through those hurdles I lost nearly an hour to them!

I'm still not sure I know how to do it if I need to again.

viccis · 1h ago

>In EC2, you can now change security groups and IAM roles without shutting the instance down to do it.

Hasn't it been this way for many years?

>Spot instances used to be much more of a bidding war / marketplace.

Yeah because there's no bidding any more at all, which is great because you don't get those super high spikes as availability drops and only the ones who bid super high to ensure they wouldn't be priced out are able to get them.

>You don’t have to randomize the first part of your object keys to ensure they get spread around and avoid hotspots.

This one was a nightmare and it took ages to convince some of my more pig headed coworkers in the past that they didn't need to do it any more. The funniest part is that they were storing their data as millions and millions of 10-100kb files, so the S3 backend scaling wasn't the thing bottlenecking performance anyway!

>Originally Lambda had a 5 minute timeout and didn’t support container images. Now you can run them for up to 15 minutes, use Docker images, use shared storage with EFS, give them up to 10GB of RAM (for which CPU scales accordingly and invisibly), and give /tmp up to 10GB of storage instead of just half a gig.

This was/is killer. It used to be such a pain to have to manage pyarrow's package size if I wanted a Python Lambda function that used it. One thing I'll add that took me an embarrassingly long time to realize is that your Python global scope is actually persisted, not just the /tmp directory.

PaulDavisThe1st · 1h ago

Can no longer login to my AWS account, because I never set up MFA.

Want to set up MFA ... login required to request device.

Yes, I know, they warned us far ahead of time. But not being able to request one of their MFA devices without a login is ... sucky.

berlesi · 52m ago

Looks like something that you could solve easily through their support, no?

raffraffraff · 37m ago

Support don't talk to you unless you pay for support

jp57 · 1h ago

> Glacier restores are also no longer painfully slow.

I had a theory (based on no evidence I'm aware of except knowing how Amazon operates) that the original Glacier service operated out of an Amazon fulfillment center somewhere. When you put it a request for your data, a picker would go to a shelf, pick up some removable media, take it back, and slot it into a drive in a rack.

This, BTW, is how tape backups on timesharing machines used to work once upon a time. You'd put in a request for a tape and the operator in the machine room would have to go get it from a shelf and mount it on the tape drive.

browningstreet · 47m ago

Yeah, but they've been robotic for decades since.

raffraffraff · 37m ago

> Availability Zones used to be randomized between accounts (my us-east-1a was your us-east-1c)

WTH?

zbentley · 6m ago

It was for spreading load out. If someone was managing resources in a bunch of accounts and always defaulted to, say, 1b, AWS randomized what AZs corresponded to what datacenter segments to avoid hot spots.

The canonical AZ naming was provided because, I bet, they realized that the users who needed canonical AZ identifiers were rarely the same users that were causing hot spots via always picking the same AZ.

mpyne · 11m ago

Presumably it would help ensure that everyone selecting us-east-1a in their base configs didn't actually all land in the same AZ.

slashdev · 31m ago

Yeah this one drove me crazy

skywhopper · 28m ago

They did this to stop people from overloading us-east-1a.

It was fine, until there started to be ways of wiring up networks between accounts (eg PrivateLink endpoint services) and you had to figure out which AZ was which so you could be sure you were mapping to the the same AZs in each account.

I built a whole methodology for mapping this out across dozens of AWS accounts, and built lookup tables for our internal infrastructure… and then AWS added the zone ID to AZ metadata so that we could just look it up directly instead.

kassner · 47m ago

I haven’t used AWS in the last 5 years. Is IPv6 still somewhat of an issue? I remember some services not supporting it at all and making it impossible to manage as a IPv6-only network.

1oooqooq · 5m ago

gotta milk those ipv4 investment

skywhopper · 26m ago

Yeah, it’s still limited, and a few things still require at least a dual stack setup.

the8472 · 31m ago

> As of very recently, you can also force EC2 instances to stop or terminate without waiting for a clean shutdown or a ridiculous timeout

Not true for GPU instances, they're stuck 5 minutes in a stopping state because they run some GPU health checks.

csours · 5h ago

Strictly off topic:

Everything you know is wrong.

Weird Al. https://www.youtube.com/watch?v=W8tRDv9fZ_c

Firesign Theatre. https://www.youtube.com/watch?v=dAcHfymgh4Y

general1726 · 2h ago

I think there is more of us who kind of degenerated from doing it the AWS way - API Gateway, serverless lambdas mess around with IAM roles until it works, ... - to - Give me EC2 / LightSail VPS instance maybe an S3 bucket let's set domain through Route53 and go away with the rest of your orchestrion AWS.

calmbonsai · 2h ago

There are entire industries that have largely de-volved their clouds primarily for footprint flexibility (not all AWS services are in all regions) and billing consistency.

regularfry · 1h ago

Honestly just having to manage IAM is such a time-suck that the way I've explained it to people is that we've traded the time we used to spend administering systems for time spent just managing permissions, and IAM is so obtuse that it comes out as a net loss.

There's a sweet spot somewhere in between raw VPSes and insanely detailed least-privilege serverless setups that I'm trying to revert to. Fargate isn't unmanageable as a candidate, not sure it's The One yet but I'm going to try moving more workloads to it to find out.

SOLAR_FIELDS · 6h ago

You know what's still stupid? That if you have an S3 bucket in the same region as your VPC that you will get billed on your NAT Gateway to send data out to the public internet and right back in to the same datacenter. There is simply no reason to not default that behavior to opt out vs opt in (via a VPC endpoint) beyond AWS profiting off of people's lack of knowledge in this realm. The amount of people who would want the current opt-in behavior is... if not zero, infinitesimally small.

solatic · 4h ago

It's a design that is secure by default. If you have no NAT gateway and no VPC Gateway Endpoint for S3 (and no other means of Internet egress) then workloads cannot access S3. Networking should be closed by default, and it is. If the user sets up things they don't understand (like NAT gateways), that's on them. Managed NAT gateways are not the only option for Internet egress and users are responsible for the networks they build on top of AWS's primitives (and yes, it is indeed important to remember that they are primitives, this is an IaaS, not a PaaS).

bilalq · 4h ago

Fine for when you have no NAT gateway and have a subnet with truly no egress allowed. But if you're adding a NAT gateway, it's crazy that you need to setup the gateway endpoint for S3/DDB separately. And even crazier that you have to pay for private links per AWS service endpoint.

solatic · 2h ago

There's very real differences between NAT gateways and VPC Gateway Endpoints.

NAT gateways are not purely hands-off, you can attach additional IP addresses to NAT gateways to help them scale to supporting more instances behind the NAT gateway, which is a fundamental part of how NAT gateways work in network architectures, because of the limit on the number of ports that can be opened through a single IP address. When you use a VPC Gateway Endpoint then it doesn't use up ports or IP addresses attached to a NAT gateway at all. And what about metering? If you pay per GB for traffic passing through the NAT gateway, but I guess not for traffic to an implicit built-in S3 gateway, so do you expect AWS to show you different meters for billed and not-billed traffic, but performance still depends on the sum total of the traffic (S3 and Internet egress) passing through it? How is that not confusing?

It's also besides the point that not all NAT gateways are used for Internet egress, indeed there are many enterprise networks where there are nested layers of private networks where NAT gateways help deal with overlapping private IP CIDR ranges. In such cases, having some kind of implicit built-in S3 gateway violates assumptions about how network traffic is controlled and routed, since the assumption is for the traffic to be completely private. So even if it was supported, it would need to be disabled by default (for secure defaults), and you're right back at the equivalent situation you have today, where the VPC Gateway Endpoint is a separate resource to be configured.

Not to mention that VPC Gateway Endpoints allow you to define policy on the gateway describing what may pass through, e.g. permitting read-only traffic through the endpoint but not writes. Not sure how you expect that to work with NAT gateways. This is something that AWS and Azure have very similar implementatoons for that work really well, whereas GCP only permits configuring such controls at the Organization level (!)

They are just completely different networking tools for completely different purposes. I expect closed-by-default secure defaults. I expect AWS to expose the power of different networking implements to me because these are low-level building blocks. Because they are low-level building blocks, I expect for there to be footguns and for the user to be held responsible for correct configuration.

bilalq · 1h ago

My objections here are in terms of how this manifests in billing. Especially when you consider the highway robbery rates for internet egress.

otterley · 4h ago

This is the intended use case for S3 VPC Gateway Endpoints, which are free of charge.

https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpo...

(Disclaimer: I work for AWS, opinions are my own.)

darkwater · 4h ago

I think they know it. They are complaining it's not enabled by default (and so do I).

otterley · 4h ago

AWS VPCs are secure by default, which means no traffic traverses their boundaries unless you intentionally enable it.

There are many IaC libraries, including the standard CloudFormation VPC template and CDK VPC class, that can create them automatically if you so choose. I suspect the same is also true of commonly-used Terraform templates.

hylaride · 2h ago

As others have pointed out, this is by design. If VPCs have access to AWS resources (such as S3, DynamoDB, etc), an otherwise locked down VPC can still have data leaks to those services, including to other AWS accounts.

It's a convenience VS security argument, though the documentation could be better (including via AWS recommended settings if it sees you using S3).

SOLAR_FIELDS · 4h ago

The problem is that the default behavior for this is opt-in, rather than opt-out. No one prefers opt-in. So why is it opt-in?

icedchai · 2h ago

If it were opt-out someone would accidentally leave it on and eventually realize that entire systems had been accidentally "backed up" and exfiltrated to S3.

SOLAR_FIELDS · 2h ago

What? The same is possible whether it's opt-in or opt-out. It's just that if you have the gateway as opt-out you wouldn't also have this problem AND a massive AWS bill. You would just have this problem.

Spivak · 1h ago

The bad situation is if you created a VPC with no internet access but the hypothetical automatic VPC endpoint still let instances access S3. Then a compromised instance has a vector for data exfiltration.

otterley · 3h ago

AWS VPCs are secure by default, which means no traffic traverses their boundaries unless you intentionally enable it.

SOLAR_FIELDS · 2h ago

"The door is locked, so instead of suggesting to the end user that they should unlock the door with this key that we know how to give the end user deterministically, we instead tell them to drive across town and back on our toll roads and collect money from it"

This has been a common gotcha for over a decade now: https://www.lastweekinaws.com/blog/the-aws-managed-nat-gatew...

otterley · 56m ago

Speaking solely on my own behalf: I don't know a single person at AWS (and I know a lot of them) who wants to mislead customers into spending more money than they need to. I remember a time before Gateway Endpoints existed, and customers (including me at the time) were spending tons of money passing traffic through pricey NAT Gateways to S3. S3 Gateway Endpoints saved them money.

SOLAR_FIELDS · 21m ago

Clearly you guys are aware of the problem though. I mean, every time this thing happens there's probably a ticket. I've personally filed one myself years ago when it happened to me. So why has the behavior not changed? You don't have to give up security to remove this footgun, it's possible to remove it and still make it an opt-in action for security purposes.

hinkley · 3h ago

Your job depends upon you misunderstanding the problem.

afandian · 5h ago

Having experienced the joy of setting up VPC, subnets and PrivateLink endpoints the whole thing just seems absurd.

They spent the effort of branding private VPC endpoints "PrivateLink". Maybe it took some engineering effort on their part, but it should be the default out of the box, and an entirely unremarkable feature.

In fact, I think if you have private subnets, the only way to use S3 etc is Private Link (correct me if I'm wrong).

It's just baffling.

time0ut · 4h ago

You can provision gateway endpoints for S3 and DynamoDB. They are free and considered best practice. They are opt-in though, but easy to enable.

mdaniel · 44m ago

And ECR, which I would guess impacts more folks than DynamoDB https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-e...

And, as as added benefit, they distinguish between "just pull" and "pull and push" which is nice

dmart · 5h ago

VPC endpoints in general should be free and enabled by default. That you need to pay extra to reach AWS' own API endpoints from your VPC feels egregious.

otterley · 4h ago

Gateway endpoints are free. Network endpoints (which are basically AWS-managed ENIs that can tunnel through VPC boundaries) are not free.

S3 can use either, and we recommend establishing VPC Gateway endpoints by default whenever you need S3 access.

(Disclaimer: I work for AWS, opinions are my own.)

Hikikomori · 4h ago

Why don't you have gateway endpoints for all your APIs?

tux3 · 5h ago

That is price segmentation. People who are price insensitive will not invest the time to fix it

People who are probably shouldn't be on aws - but they usually have to for unrelated reasons, and they will work to reduce their bill.

nbngeorcjhe · 5h ago

> People who are price insensitive will not invest the time to fix it

This just sounds like a polite way of saying "we're taking peoples' money in exchange for nothing of value, and we can get away with it because they don't know any better".

robertlagrant · 4h ago

It's more like: we made loads of stuff super cheap but here's where we make some money because it scales with use.

amenghra · 5h ago

Price segmentation happens all the time in pretty much every industry.

hinkley · 3h ago

There’s an entire Pandora’s box of shitty things that happen in pretty much every industry. I don’t think you want to use that defense.

happytoexplain · 5h ago

>People who are price insensitive will not invest the time to fix it

Hideous.

torginus · 2h ago

If you had an ALB inside the VPC that routed the requests to something that lives inside the VPC, which called the AWS PutObject api on the bucket, would that still be the case?

kbolino · 5h ago

The problem is that VPC endpoints aren't free.

They should be, of course, at least when the destination is an AWS service in the same region.

[edit: I'm speaking about interface endpoints, but S3 and DynamoDB can use gateway endpoints, which are free to the same region]

otterley · 4h ago

Gateway endpoints are free. Network endpoints (which are basically AWS-managed ENIs that can tunnel through VPC boundaries) are not free.

S3 can use either, and we recommend establishing VPC Gateway endpoints by default whenever you need S3 access.

(Disclaimer: I work for AWS, opinions are my own.)

JoshTriplett · 2h ago

That's fascinating! I hadn't found that in the documentation; everything seems to steer people towards PrivateLink, not gateway endpoints.

Would you recommend using VPC Gateway even on a public VPC that has an Internet gateway (note: not a NAT gateway)? Or only on a private VPC or one with a NAT gateway?

otterley · 2h ago

I recommend S3 Gateways for all VPCs that need to access S3, even those that already have routes to the Internet. Plus they eliminate the need for NAT Gateway traversal for requests that originate from private subnets.

paulddraper · 1h ago

> everything seems to steer people towards PrivateLink, not gateway endpoints

Gateway endpoints only work for some things.

Hikikomori · 36m ago

Privatelink endpoints can be of type gateway or interface. Only gateway is free and only S3 and dynamodb supports it.

kbolino · 4h ago

Fair point, and valid for S3 (the topic at hand) and DynamoDB.

Other AWS services, though, don't support gateway endpoints.

mdaniel · 37m ago

https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-e...

~~I get the impression there are several others, too, but that one is of especial interest to me~~ Wowzers, they really are much better now:

  aws --region us-east-1 ec2 describe-vpc-endpoint-services | jq '.ServiceNames|length'
  459

If you're saying "other services should offer VPC Endpoints," I am 100% on-board. One should never have to traverse the Internet to contact any AWS control plane

paulddraper · 4h ago

Well yeah that's the point....why route through the public internet.

kbolino · 4h ago

I doubt the traffic ever actually leaves AWS. Assuming it does make it all the way out to their edge routers, the destination ASN will still be one of their own. Not that the pricing will reflect this, of course.

The other problem with (interface) VPC endpoints is that they eat up IP addresses. Every service/region permutation needs a separate IP address drawn from your subnets. Immaterial if you're using IPv6, but can be quite limiting if you're using IPv4.

immibis · 3m ago

Sounds like a good reason to use IPv6.

immibis · 4h ago

A company making revenue is not stupid.

aaronblohowiak · 5h ago

>VPC peering used to be annoying; now there are better options like Transit Gateway, VPC sharing between accounts, resource sharing between accounts, and Cloud WAN.

TGW is... twice as expensive as vpc peering?

klysm · 3h ago

VPC sharing is the sleeper here. You can do cross account networking all in the same VPC and skip all the expensive stuff.

aaronblohowiak · 3h ago

as long as your VPCs aren't too big, yea.

Hikikomori · 1h ago

Shared vpcs can get pretty big. Even if you approach the NAU limit you can use privatelink or TGW to have more large shared vpcs.

alFReD-NSH · 5h ago

And vpc sharing is free. Cost and architecture are tied.

Hikikomori · 5h ago

More than twice as same AZ is free with peering. But if you're big enough you can get better deals on cost.

But unlike peering TGW traffic flows through an additional compute layer so it has additional cost.

chisleu · 5h ago

> You don’t have to randomize the first part of your object keys to ensure they get spread around and avoid hotspots.

As of when? According to internal support, this is still required as of 1.5 years ago.

laurent_du · 2h ago

He's not talking about the prefix, just the beginning of the object key.

viccis · 1h ago

The prefix is not separate from the object key. It's part of it. There's no randomization that needs to be done on either anymore.

beaviskhan · 1h ago

Also S3 related: the bucket owner can now be configured as the object owner no matter where the object originated. In the past this was exceedingly painful if you wanted to allow one account contribute objects to a bucket in another account. You could do the initial contribution, but the contributor always owned the object, and you couldn't delegate access to a third account.

JCM9 · 2h ago

Some good stuff here. I wish AWS would just focus on these boring, but ultimately important, things that they’re good at instead of all the current distractions trying to play catch up on “AI.” AWS leadership missed the boat there big time, but that’s OK.

Ultimately AWS doesn’t have the right leadership or talent to be good at GenAI, but they do (or at least used to) have decent core engineers. I’d like to see them get back to basics and focus there. Right now leadership seems panicked about GenAI and is just throwing random stuff at the wall desperately trying to get something to stick. Thats really annoying to customers.

gurjeet · 5h ago

It would've been nice if each of those claims in the article also linked to either the relevant announcement or to the documentation. If I'm interested in any of these headline items, I'd like to learn more.

mdaniel · 32m ago

I don't believe AWS offers permalinks, so it would only help until they rolled over the next documentation release :-(

They actually used to have the upstream docs in GitHub, and that was super nice for giving permalinks but also building the docs locally in a non-pdf-single-file setup. Pour one out, I guess

ElijahLynn · 54m ago

That. Was a decent investment of my time as a devops engineer. Right to the point. I learned things.

topher200 · 3h ago

I have a preempt-able workload for which I could use Spot instances or Savings Plans.

Does anyone have experience running Spot in 2025? If you were to start over, would you keep using Spot?

  - I observe with pricing that Spot is cheaper
  - I am running on three different architectures, which should limit Spot unavailability
  - I've been running about 50 Spot EC2 instances for a month without issue. I'm debating turning it on for many more instances

erulabs · 1h ago

In terms of cost, from cheapest to most expensive:

1. Spot with autoscaling to adjust to demand and a savings plan that covers the ~75th percentile scale

2. On-demand with RIs (RIs will definitely die some day)

3. On-demand with savings-plans (More flexible but more expensive than RIs)

3. Spot

4. On-demand

I definitely recommend spot instances. If you're greenfielding a new service and you're not tied to AWS, some other providers have hilariously cheap spot markets - see http://spot.rackspace.com/. If you're using AWS, definitely auto-scaling spot with savings plans are the way to go. If you're using Kubernetes, the AWS Karpenter project (https://karpenter.sh/) has mechanisms for determining the cheapest spot price among a set of requirements.

Overall tho, in my experience, ec2 is always pretty far down the list of AWS costs. S3, RDS, Redshift, etc wind up being a bigger bill in almost all past-early-stage startups.

mdaniel · 33m ago

To "me, too" this, it's not like that AWS spot instance just go "poof," they do actually warn you (my recollection is 60s in advance of the TerminateInstance call), and so a resiliency plane on top of the workloads (such as the cited Kubernetes) can make that a decided "non-event". Shout out to the reverse uptime crew, a subset of Chaos Engineering

stevejb · 3h ago

I just saw Weird Al in concert, and one of my favorite songs of his is "Everything You Know is Wrong." This is the AWS version of that song! Nice work Corey!

havefunbesafe · 3h ago

Weird AL or Weird A.I.?

causal · 2h ago

This is super helpful. I would read a yearly summary like this.

TheP1000 · 4h ago

API gateway timeout increase has been nice.

coredog64 · 3h ago

It was always there but it required much more activity to get it done (document your use case & traffic levels and then work with your TAM to get the limit changed).

bob1029 · 5h ago

> Glacier restores are also no longer painfully slow.

Wouldn't this always depend on the length of the queue to access the robotic tape library? Once your tape is loaded it should move really quickly:

https://www.ibm.com/docs/en/ts4500-tape-library?topic=perfor...

hinkley · 3h ago

> Once upon a time Glacier was its own service that had nothing to do with S3. If you look closely (hi, billing data!) you can see vestiges of how this used to be, before the S3 team absorbed it as a series of storage classes.

Your assumption holds if they still use tape. But this paragraph hints at it not being tape anymore. The eternal battle between tape versus drive backup takes another turn.

bob1029 · 3h ago

I am also assuming that Amazon intends for the Deep Archive tier to be a profitable offering. At $0.00099/gb-month, I don't see how it could be anything other than tape.

mappu · 38m ago

My understanding is some AWS products (e.g. RDS) need very fast disks with lots of IOPS. To get the IOPS, though, you have to buy +++X TB sized SSDs, far more storage space than RDS actually needs. This doesn't fully utilize the underlying hardware, you are left with lots of remaining storage space but no IOPS. It's perfect for Glacier.

The disks for Glacier cost $0 because you already have them.

simonw · 2h ago

I wonder if it's where old S3 hard drives go to die? Presumably AWS have the world's single largest collection of used storage devices - if you RAID them up you can probably get reliable performance out of them for Glacier?

bob1029 · 2h ago

I still don't know if it's possible to make it profitable with old drives in this kind of arrangement, especially if we intend to hit their crazy durability figures. The cost of keeping drives spinning is low, but is double-digit margin % in this context. You can't leave drives unpowered in a warehouse for years on end and say you have 11+ nines of durability.

hinkley · 2h ago

Unpowered in a warehouse is a huge latency problem.

For storage especially we now build enough redundancy into systems that we don't have to jump on every fault. That reduces the chance of human error when trying to address it, and pushing the hardware harder during recovery (resilvering, catching up in a distributed concensus system, etc).

When the entire box gets taken out of the rack due to hitting max faults, then you can piece out the machine and recycle parts that are still good.

You could in theory ship them all off to the backend of nowhere, but it seems that Glacier is all the places where AWS data centers are, so it's not that. But Glacier being durable storage, with a low expectation of data out versus data in, they could and probably are cutting the aggregate bandwidth to the bone.

How good do your power backups have to be to power a pure Glacier server room? Can you use much cheaper in-rack switches? Can you use old in-rack switches from the m5i era?

Also most of the use cases they mention involve linear reads, which has its own recipe book for optimization. Including caching just enough of each file on fast media to hide the slow lookup time for the rest of the stream.

Little's Law would absolutely kill you in any other context but we are linear write, orders of magnitude fewer reads here. You have hardware sitting around waiting for a request. "Orders of magnitude" is the space where interesting solutions can live.

lijok · 2h ago

You don’t raid old drives as it creates cascading failures because recovering from a failed drive adds major wear to other drives

cavisne · 2h ago

http://www.patentbuddy.com/Patent/20140047261

Is tape even cost competitive anymore? The market would be tiny.

hinkley · 55m ago

It's gone in cycles for as long as I recall and older devs around 2010 said it had been going on for as long as they could recall.

cldcntrl · 6h ago

> You don’t have to randomize the first part of your object keys to ensure they get spread around and avoid hotspots.

Not strictly true.

QuinnyPig · 1h ago

I should have been more clear. You still need to partition, but randomizing the prefixes hasn't been needed since 2018: https://web.archive.org/web/20240227073321/https://aws.amazo...

rthnbgrredf · 6h ago

Elaborate.

cldcntrl · 5h ago

The whole auto-balancing thing isn't instant. If you have a burst of writes with the same key prefix, you'll get throttled.

hnlmorg · 5h ago

Not the OP but I’ve had AWS-staff recommend different prefixes even as recently as last year.

If key prefixes don’t matter much any more, then it’s a very recent change that I’ve missed.

williamdclt · 5h ago

Might just be that the AWS staff wasn't up to date on this

time0ut · 4h ago

I have had the same experience within the last 18 months. The storage team came back to me and asked me to spread my ultra high throughput write workload across 52 (A-Za-z) prefixes and then they pre-partitioned the bucket for me.

S3 will automatically do this over time now, but I think there are/were edge cases still. I definitely hit one and experienced throttling at peak load until we made the change.

hnlmorg · 4h ago

That’s sounds like the problem we were having. Lots of writes to a prefix over a short period of time and then low activity to it after about 2 weeks.

hnlmorg · 5h ago

That’s possible but they did consult with the storage team prior to our consultation.

But I don’t know what conversations did or did not happen behind the scenes.

rthnbgrredf · 4h ago

By the way, that happens quite frequently. I regularly ask them about new AWS technologies or recent changes, and most of the time they are not aware. They usually say they will call back later after doing some research.

cldcntrl · 5h ago

That's right, same for me as of only a few months ago.

scubbo · 5h ago

I've had two people tell me in the last week that SQS doesn't support FIFO queues.

lysace · 38m ago

Paid AWS support got a lot less capable on average during these two decades . :/

My recent interactions with them would probably have been better if they were an LLM.

oblio · 16m ago

They probably are an LLM and if they aren't, their higher management is pushing for them to be LLMs by 2027 at the latest.

digianarchist · 3h ago

Would love an AWS equivalent to Cloud Run but the lambda changes are welcome nonetheless.

Ayesh · 5h ago

CloudFront also has 1TB of free data transfer a month under the forever-free perks.

nodesocket · 3h ago

I'll add: When doing instance to instance communication (in the same AZ) always use private ips. If you use public ip routing (even the same AZ) this is charged as regional data transfer.

Even worse, if you run self hosted NAT instance(s) don't use a EIP attached to them. Just use a auto-assigned public IP (no EIP).

  NAT instance with EIP
    - AWS routes it through the public AWS network infrastructure (hairpinning).
    - You get charged $0.01/GB regional data transfer, even if in the same AZ.

  NAT instance with auto-assigned public IP (no EIP)
    - Traffic routes through the NAT instance’s private IP, not its public IP.
    - No regional data transfer fee — because all traffic stays within the private VPC network.
    - auto-assigned public IP may change if the instance is shutdown or re-created so have automations to handle that. Though you should be using the network interface ID reference in your VPC routing tables.

themafia · 2h ago

> You get charged $0.01/GB regional data transfer, even if in the same AZ.

My understanding is that transfer gets charged on both sides as well. So if you own both sides you'll pay $0.02/GB.

Coris (YC S22) Is Hiring (ycombinator.com)

14.ai (YC W24) is hiring engineers in SF to build an AI-native Zendesk (14.ai)

Spice Data (YC S19) Is Hiring a Product Associate (New Grad) (ycombinator.com)

Ashby (YC W19) Is Hiring Design Engineers in AMER and EMEA (ashbyhq.com)

EasyPost (YC S13) Is Hiring (easypost.com)

Tesorio (YC S15) Is Hiring a Senior GenAI Engineer (100% Remote) (tesorio.com)

OneSignal (YC S11) Is Hiring Engineers (onesignal.com)

Axle (YC S22) is hiring product engineers (ycombinator.com)

Mbodi AI (YC X25) Is Hiring a Founding Research Engineer (Robotics) (ycombinator.com)

ReadMe (YC W15) Is Hiring a Developer Experience PM (readme.com)

Weave (YC W25) is hiring a founding AI engineer (ycombinator.com)

Depot (YC W23) Is Hiring a Community and Events Manager (Remote) (ycombinator.com)

CoLoop (YC S21) Is Hiring AI Engineers in London

Trellis (YC W24) Is Hiring: Automate Prior Auth in Healthcare (ycombinator.com)

Type (YC W23) is hiring a founding engineer to build an AI-native doc editor (ycombinator.com)

Foundry (YC F24) is hiring staff-level product engineers (ycombinator.com)

GoGoGrandparent (YC S16) Is Hiring Back End and Full-Stack Engineers

Kyber (YC W23) is hiring enterprise account executives (ycombinator.com)

Converge (YC S23) well-capitalized New York startup seeks product developers (runconverge.com)

Great Question (YC W21) Is Hiring a VP of Engineering (Remote) (ycombinator.com)

Coverage Cat (YC S22) Is Hiring a Senior, Staff, or Principal Engineer (coveragecat.com)

Kaizen (YC X25) is hiring engineers to build browser agents that work (kaizenautomation.com)

Infracost (YC W21) hiring first PM to shift $600B cloud spend to proactive (ycombinator.com)

Sei (YC W22) Is Hiring a Full Stack Engineer in Chennai, India (ycombinator.com)

Artie (YC S23) Is Hiring Founding AEs (ycombinator.com)

Cedana (YC S23) Is Hiring a Systems Engineer (ycombinator.com)

CodeCrafters (YC S22) is hiring first Marketing Person (ycombinator.com)

PAX Markets (YC W25) is hiring a founding principal hardware (RTL) engineer (ycombinator.com)

Sendblue (YC S23) is hiring senior engineers (ycombinator.com)

Thunder Compute (YC S24) Is Hiring a C++ Systems Engineer (ycombinator.com)

Optery (YC W22) Is Hiring in Engineering, Legal, Sales, Marketing (U.S., Latam) (optery.com)

AWS in 2025: Stuff you think you know that's now wrong

Comments (136)