Ask HN: Is anyone doing intelligent tiering for logs?
2 didro 6 6/12/2025, 3:11:10 PM
You might be familiar with Amazon S3 Intelligent-Tiering:
https://aws.amazon.com/s3/storage-classes/intelligent-tiering/. It automatically moves data to cheaper storage tiers when it hasn’t been accessed for a while.
I’m wondering if a similar approach could work for observability data — especially logs. Hot storage is expensive, and much of the data may not be queried after a short period. Moving unused logs to warm or cold storage (or dropping them) could potentially save a lot.
Has anyone tried this kind of tiering or aging strategy for logs or metrics? Would love to hear how you approached it — tools, heuristics, lessons learned. Thoughts and speculation are also very welcome!
We adopted a tiered approach that's working well for us so far:
1. Hot Tier (Last 7 Days): Elasticsearch. For our real-time debugging and immediate operational needs, nothing beats the query speed of Elasticsearch. We keep a rolling 7-day window of all logs here. It's expensive, but essential.
2. Warm Tier (7-90 Days): AWS S3 Standard. After 7 days, our log shipper (Fluentd) automatically archives the logs to S3. If we need to investigate an older issue, we can still query these logs directly using AWS Athena. It's much slower than Elasticsearch, but for occasional, deep-dive investigations, the cost savings are massive.
3. Cold Tier (After 90 Days): S3 Glacier Deep Archive. After 90 days, the logs are transitioned to Glacier Deep Archive via S3's lifecycle policies. This is purely for long-term compliance and "break glass in case of emergency" scenarios. It's incredibly cheap to store, but we know that retrieving it would be a slow and deliberate process.
The key lesson for us was to be realistic about our actual query patterns. We found that over 95% of our queries were for logs less than 3 days old. This data-driven approach allowed us to be aggressive with our tiering strategy without sacrificing critical visibility.
But how did you get that query patterns? Were there some Elasticsearch API, proxies or smth else?