tbh that's not the flex. storing 100PB of logs just means we haven't figured out what's actually worth logging. metrics + structured events can usually tell 90% of the story. the rest? trace level chaos no one reads unless prod's on fire. what'd could've done better be: auto pruning logs that no alert ever looked at. or logs that never hit a search query in 3 months. call it attention weighted retention. until then this is just high end digital landfill with compression
hnlmorg · 4h ago
I’m of the opposite opinion. It’s better to ingest everything and then filter out the stuff you don’t want at the observability platform.
The problem of filtering out debug logs is you don’t need them, until you do. And then trying to recreate an event you can’t even debug is often impossible. So it’s easier to then retrieve those debug logs if they’re already there but hidden.
hinkley · 1h ago
Once a bug is closed the value of those logs starts to decay. And the fact is that we get punished for working on things that aren’t in the sprint, and working on “done done” stories is one of those ways. Even if you want to clean up your mess, there’s incentive not to. And many of us very clearly don’t like to clean up our own messes, so a ready excuse gets them out of the conversation about a task they don’t want to be voluntold to do.
pstuart · 1h ago
My approach for this is to add dev logging IN ALL CAPS so that it stands out as ugly and "need adjusting", which is to delete it before merging to main.
jgalt212 · 3h ago
> then filter out the stuff you don’t want
This is often easier said than done. And there's ginormous costs associated with logging everything. Money that can be better spent elsewhere.
Also, logging everything creates yet another security hole to worry about.
phillipcarter · 2h ago
> And there's ginormous costs associated with logging everything
If you use a tool that defaults the log spew to a cheap archive, sampling to the fast store, and a way to pull from the archive on-demand much of that is resolved. FWIW I think most orgs get big scared at seeing $$$ in their cloud bills, but don't properly account for time spent by engineers rummaging around for data they need but don't have.
nijave · 59m ago
>but don't properly account for time spent by engineers rummaging around for data they need but don't have
This is a tricky one that's come up recently. How you you quantify the value of $$$ observability platform? Anecdotally I know robust tracing data can help me find problems in 5-15 minutes that would have taken hours or days with manual probing and scouring logs.
Even then you have the additional challenge of quantifying the impact of the original issue.
phillipcarter · 32m ago
At the end of the day it's just vibes. If the company is one that sees:
- Reliability as a cost center
- Vendor costs are to be limited
- CIO-driven rather than CTO-driven
Then it's going to be a given that they prioritize costs that are easy to see, and will do things like force a dev team to work for a month to shave ~2k/month off of a cloud bill. In my experience, these orgs will also sometimes do a 180 when they learn that their SLAs involve paying out to customers at a premium during incidents, which is always very funny to observe. Then you talk to some devs and they say things like "we literally told them this would happen years ago and it fell on deaf ears" or something.
hinkley · 1h ago
Java had particularly bad performance for logging for a good while and I used to make applications noticeably faster by clearing out the logs nobody cared about anymore. Just have to be careful about side effects in the log lines.
hnlmorg · 3h ago
Not really. Most observability platforms already have tools to support this kind of workflow in a more cost effective way.
> Also, logging everything creates yet another security hole to worry about.
I think the real problem isn’t logging, it’s the fact that your developers are logging sensitive information. If they’re doing that, then it’s a moot point if those logs are also being pushed to a third party observability platform or not because you’re already leaking sensitive information.
jgalt212 · 58m ago
Fair enough, but if you don't push them to "log everything" there are less chances for error.
gavinray · 2h ago
"Better to have it and not need it; than to need it, and not have it..."
jkogara · 2h ago
Or more succinctly, albeit less eloquently: "Better to be looking at it than looking for it."
Macha · 1h ago
I've been in a bunch of companies that have pushed for reducing logs in favour of metrics and a limited set of events, usually motivated by "we're using datadog and it's contract renewal time and the number is staggering".
The problem is, if you knew what was going to go wrong, you'd have fixed it already. So when there's a report that something did not operate correctly and you want to find out WTF happened, the detailed logs are useful, but you don't know which logs are useful for that unless you have reoccuring problems.
hinkley · 1h ago
That logging isn’t even free on the sending side, especially in languages where they are eager to get the logs to disk in case the final message reveals why the program crashed.
And there’s a lot of scanning blindness out there. Too much extraneous data can hide correlations between other logs entries. And there’s half life in value of logs written for bugs that are already closed, and it’s fairly short.
I prefer stats because of the way they get aggregated. Though for GIL languages some models like OTEL have higher overhead than they should.
nijave · 56m ago
In fairness, I think a lot of GIL languages already have high overload and I've never been under the impression OTEL was optimized for performance and efficiency.
imiric · 4h ago
Sure, but if the data is already there, it's a sifting and pruning problem, which can be done after ingestion, if needed.
It's better to have all data and not need it, than to need it and not have it. Assuming you have the resources to ingest it in the first place, which seems like the focus of the optimization work they did.
Spivak · 2h ago
> trace level chaos no one reads unless prod's on fire
God why do we keep these fire extinguishers around, they sit unused 99.999% of the time.
nikolayasdf123 · 2h ago
yeah, same thoughts.
business events + error/tail-sampled traces + metrics
... and logs in rare cases when none of the above works. logs are dump of everyting. why would you want to have so many logs in first place? and then build whole infra to scale that? and who and how reads all those logs? they build metrics on top of that? so might as well just build metrics directly and purposefully? with such high volume, even LLMs would not read them (too slow and too costly).. and what would even LLM tell from those logs? (may be sparce/low signal, hard to decipher without tool-calling, like creating merics)
namanyayg · 4h ago
flagging for ai generated, @dang please have a look at this acc
jurgenkesker · 7h ago
So yeah, this is only really relevant for collecting logs from clickhouse. Not for logs from anything else. Good for them, and I really love Clickhouse, but not really relevant.
mrbluecoat · 7h ago
Noteworthy point:
> If a service is crash-looping or down, SysEx is unable to scrape data because the necessary system tables are unavailable. OpenTelemetry, by contrast, operates in a passive fashion. It captures logs emitted to stdout and stderr, even when the service is in a failed state. This allows us to collect logs during incidents and perform root cause analysis even if the service never became fully healthy.
fuzzy2 · 6h ago
Everything OTel I ever did was fully active. So I wouldn't say this is very noteworthy. Instead it is wrong/incomplete information.
jappgar · 4h ago
Observability maximalism is a cult. A very rich one.
hinkley · 1h ago
Funny how they give you a problem and solve it for you for a small monthly fee.
k__ · 4h ago
Well, if you wanna investigate unknown unknowns, there isn't much alternative.
iw7tdb2kqo9 · 7h ago
I haven't worked in ClickHouse level scale.
Can you search log data in this volume? ElasticSearch has query capabilities for small scale log data I think.
Why would I use ClickHouse instead of storing log data as json file for historical log data?
munchbunny · 6h ago
> Can you search log data in this volume?
(Context: I work at this scale)
Yes. However, as you can imagine, the processing costs can be potentially enormous. If your indexing/ordering/clustering strategy isn't set up well, a single query can easily end up costing you on the order of $1-$10 to do something as simple as "look for records containing this string".
My experiences line up with theirs: at the scale where you are moving petabytes of data, the best optimizations are, unsurprisingly, "touch as little data as few times as possible" and "move as little data as possible". Every time you have to serialize/de-serialize, and every time you have to perform disk/network I/O, you introduce a lot of performance cost and therefore overall cost to your wallet.
Naturally, this can put OTel directly at odds with efficiency because the OTel collector is an extra I/O and serialization hop. But then again, if you operate at the petabyte scale, the amount of money you save by throwing away a single hop can more than pay for an engineer whose only job is to write serializer/deserializer logic.
sethammons · 7h ago
Scale and costs. We are faced with logging scale at my work. A naive "push json into splunk" will cost us over $6M/year, but I can only get maybe 5-10% of that approved.
In the article, they talk about needing 8k cpu to process their json logs, but only 90 cpu afterward.
h1fra · 3h ago
Couple of years ago clickhouse wasn't that good with full text search, to me that was the biggest drawback. Yes it's faster and can handle ES scale but depending on your use case it's way faster to query ES when you do FTS or grouping without pre-build index.
the_arun · 4h ago
I didn’t see how long logs are kept - retention time. After x months you may need summary/aggregated data but not sure about raw data.
atemerev · 7h ago
When I get back from Clickhouse to Postgres, I am always shocked. Like, what it is doing for some minutes importing this 20G dump? Shouldn't it take seconds?
joshstrange · 6h ago
Every time I use Clickhouse I want blow my brains out, especially knowing that Postgres exists. I’m not saying Clickhouse doesn’t have its place or that Postgres can do everything that Clickhouse can.
What I am saying is that I really dislike working in Clickhouse with all of the weird foot guns. Unless you are using it in a very specific, and in my opinion, limited way, it feels worse than Postgres in every way.
atemerev · 5h ago
I mostly need analytics, all data is immutable and append-only.
joshstrange · 5h ago
And that’s exactly the limited-ness I’m talking about. If that works for you, Clickhouse is amazing. For things like logs I can 100% see the value.
Other data that is ETL’d and might need to update? That sucks.
edmundsauto · 2h ago
There are design patterns / architectures that data engineers often employ to make this less "sucky". Data modeling is magical! (Specifically talking about things like datelist and cumulative tables)
Thaxll · 5h ago
I mean if you don´t get the logs when the serivce is down the entire solution is useless.
revskill · 6h ago
THis industry is mostly filled with half-baked or in-progress standards which leads to segmentation of the ecosystems. From graphql, to openapi, to mcp,... to everything, nothing is perfect and it's fine.
The problem is, people who created spec is just following trial and error approach, which is insane.
the_real_cher · 7h ago
What is the trick that this and dynamo use?
Are they just basically large hash tables?
ofrzeta · 8h ago
Whenever I read things like this I think: You are doing it wrong. I guess it is an amazing engineering feat for Clickhouse but I think we (as in IT or all people) should really reduce the amount of data we create. It is wasteful.
CSDude · 8h ago
Blanket statements like this miss the point. Not all data is waste. Especially high-cardinality, non-sampled traces. On a 4-core ClickHouse node, we handled millions of spans per minute. Even short retention windows provided critical visibility for debugging and analysis.
Sure, we should cut waste, but compression exists for a reason. Dropping valuable observability data to save space is usually shortsighted.
And storage isn't the bottleneck it used to be. Tiered storage with S3 or similar backends is cheap and lets you keep full-fidelity data without breaking the budget.
ofrzeta · 7h ago
> Dropping valuable observability data to save space is usually shortsighted
That's a bit of a blanket statement, too :) I've seen many systems where a lot of stuff is logged without much thought. "Connection to database successful" - does this need to be logged on every connection request? Log level info, warning, debug? Codebases are full of this.
nijave · 48m ago
Yes, it allows you to bisect a program to see the block of code between log statements where the program malfunctioned. More log statements slice the code into smaller blocks meaning less places to look.
throwaway0665 · 7h ago
There's always another log that could have been key to getting to the bottom of an incident. It's impossible to know completely what will be useful in advance.
citrin_ru · 7h ago
Probably not very useful for prod (non debug) logging, but it’s useful when such events are tracked in metrics (success/failure, connect/response times). And modern databases (including ClickHouse) can compress metrics efficiently so not much space will be spent on a few metrics.
jiggawatts · 6h ago
I agree with both you and the person you're replying to, but...
My centrist take is that data can be represented wastefully, which is often ignored.
Most "wide" log formats are implemented... naively. Literally just JSON REST APIs or the equivalent.
Years ago I did some experiments where I captured every single metric Windows Server emits every second.
That's about 15K metrics, down to dozens of metrics per process, per disk, per everything!
There is a poorly documented API for grabbing everything ('*') as a binary blob of a bunch of 64-bit counters. My trick was that I then kept the previous such blob and simply took the binary difference. This set most values to zero, so then a trivial run length encoding (RLE) reduced a few hundred KB to a few hundred bytes. Collect an hour of that, compress, and you can store per-second metrics collected over a month for thousands of servers in a few terabytes. Then you can apply a simple "transpose" transformation to turn this into a bunch of columns and get 1000:1 compression ratios. The data just... crunches down into gigabytes that can be queried and graphed in real time.
I've experimented with Open Telemetry, and its flagrantly wasteful data representations make me depressed.
Why must everything be JSON!?
nijave · 44m ago
I think Prometheus works similar to this with some other tricks like compressing metric names.
OTEL can do gRPC and a storage backend can encode that however it wants. However, I do agree it doesn't seem like efficiency was at the forefront when designing OTEL
XorNot · 8h ago
The problem with this is generally that you have logs from years ago, but no way to get a live stream of logs which are happening now.
(one of my immense frustrations with kubernetes - none of the commands for viewing logs seem to accept logical aggregates like "show me everything from this deployment").
Sayrus · 7h ago
Stern[1] does that. You can tail deployments, filter by labels and more.
What about "kubectl logs deploy/mydep --all-containers=true" but I guess you want more than that? Maybe https://www.kubetail.com?
knutzui · 8h ago
Maybe not via kubectl directly, but it is rather trivial to build this, by simply combining all log streams from pods of a deployment (or whatever else).
k9s (k9scli.io) supports this directly.
AlecBG · 8h ago
This sounds pretty easy to hack together with 10s of lines of python
madduci · 8h ago
And what is the sense of keeping years of logs? I could probably understand very sensitive industries, but In general, I see a pure waste of resources. At most you need 60-90 days of logs.
sureglymop · 6h ago
It makes sense to keep a high fidelity history of what happened and why. However, I think the issue is more that this data is not refined correctly.
Even when it comes to logging in the first place, I have rarely seen developers do it well, instead logging things that make no sense just because it was convenient during development.
But that touches on something else. If your logs are important data, maybe logging is the wrong way to go about it. Instead think about how to clean, refine and persist the data you need like your other application data.
I see log and trace collecting in this way almost as a legacy compatibility thing, analog to how kubernetes and containerization allows you to wrap up any old legacy application process into a uniform format, just collecting all logs and traces is backwards compatible with every application. But in order to not be wasteful and only keep what is valuable, a significant effort would be required afterwards. Well, storage and memory happen to be cheap enough to never have to care about that.
Sayrus · 7h ago
Access logs and payment information for compliance, troubleshooting and evaluating trends of something you didn't know existed until months or years later, finding out if an endpoint got exploited in the past for a vulnerability that you only now discovered, tracking events that may span across months. Logs are a very useful tool in many non-dev or longer term uses.
fc417fc802 · 7h ago
My home computer has well over 20 TB of storage. I have several LLMs, easily half a TB worth. The combined logs generated by every single program on my system might total 100 GB per year but I doubt it. And that's before compression.
Would you delete a text file that's a few KB from a modern device in order to save space? It just doesn't make any sense.
brazzy · 7h ago
One nice side effects of the GDPR is that you're not allowed to keep logs indefinitely if there is any chance at all that they contain personal information. The easiest way to comply is to throw away logs after a month (accepted as the maximum justifiable for general error analysis) and be more deliberate about what you keep longer.
tjungblut · 8h ago
tldr, they now do a zero (?) copy of raw bytes instead of marshaling and unmarshaling json.
The problem of filtering out debug logs is you don’t need them, until you do. And then trying to recreate an event you can’t even debug is often impossible. So it’s easier to then retrieve those debug logs if they’re already there but hidden.
This is often easier said than done. And there's ginormous costs associated with logging everything. Money that can be better spent elsewhere.
Also, logging everything creates yet another security hole to worry about.
If you use a tool that defaults the log spew to a cheap archive, sampling to the fast store, and a way to pull from the archive on-demand much of that is resolved. FWIW I think most orgs get big scared at seeing $$$ in their cloud bills, but don't properly account for time spent by engineers rummaging around for data they need but don't have.
This is a tricky one that's come up recently. How you you quantify the value of $$$ observability platform? Anecdotally I know robust tracing data can help me find problems in 5-15 minutes that would have taken hours or days with manual probing and scouring logs.
Even then you have the additional challenge of quantifying the impact of the original issue.
- Reliability as a cost center
- Vendor costs are to be limited
- CIO-driven rather than CTO-driven
Then it's going to be a given that they prioritize costs that are easy to see, and will do things like force a dev team to work for a month to shave ~2k/month off of a cloud bill. In my experience, these orgs will also sometimes do a 180 when they learn that their SLAs involve paying out to customers at a premium during incidents, which is always very funny to observe. Then you talk to some devs and they say things like "we literally told them this would happen years ago and it fell on deaf ears" or something.
> Also, logging everything creates yet another security hole to worry about.
I think the real problem isn’t logging, it’s the fact that your developers are logging sensitive information. If they’re doing that, then it’s a moot point if those logs are also being pushed to a third party observability platform or not because you’re already leaking sensitive information.
The problem is, if you knew what was going to go wrong, you'd have fixed it already. So when there's a report that something did not operate correctly and you want to find out WTF happened, the detailed logs are useful, but you don't know which logs are useful for that unless you have reoccuring problems.
And there’s a lot of scanning blindness out there. Too much extraneous data can hide correlations between other logs entries. And there’s half life in value of logs written for bugs that are already closed, and it’s fairly short.
I prefer stats because of the way they get aggregated. Though for GIL languages some models like OTEL have higher overhead than they should.
It's better to have all data and not need it, than to need it and not have it. Assuming you have the resources to ingest it in the first place, which seems like the focus of the optimization work they did.
God why do we keep these fire extinguishers around, they sit unused 99.999% of the time.
business events + error/tail-sampled traces + metrics
... and logs in rare cases when none of the above works. logs are dump of everyting. why would you want to have so many logs in first place? and then build whole infra to scale that? and who and how reads all those logs? they build metrics on top of that? so might as well just build metrics directly and purposefully? with such high volume, even LLMs would not read them (too slow and too costly).. and what would even LLM tell from those logs? (may be sparce/low signal, hard to decipher without tool-calling, like creating merics)
> If a service is crash-looping or down, SysEx is unable to scrape data because the necessary system tables are unavailable. OpenTelemetry, by contrast, operates in a passive fashion. It captures logs emitted to stdout and stderr, even when the service is in a failed state. This allows us to collect logs during incidents and perform root cause analysis even if the service never became fully healthy.
Can you search log data in this volume? ElasticSearch has query capabilities for small scale log data I think.
Why would I use ClickHouse instead of storing log data as json file for historical log data?
(Context: I work at this scale)
Yes. However, as you can imagine, the processing costs can be potentially enormous. If your indexing/ordering/clustering strategy isn't set up well, a single query can easily end up costing you on the order of $1-$10 to do something as simple as "look for records containing this string".
My experiences line up with theirs: at the scale where you are moving petabytes of data, the best optimizations are, unsurprisingly, "touch as little data as few times as possible" and "move as little data as possible". Every time you have to serialize/de-serialize, and every time you have to perform disk/network I/O, you introduce a lot of performance cost and therefore overall cost to your wallet.
Naturally, this can put OTel directly at odds with efficiency because the OTel collector is an extra I/O and serialization hop. But then again, if you operate at the petabyte scale, the amount of money you save by throwing away a single hop can more than pay for an engineer whose only job is to write serializer/deserializer logic.
In the article, they talk about needing 8k cpu to process their json logs, but only 90 cpu afterward.
What I am saying is that I really dislike working in Clickhouse with all of the weird foot guns. Unless you are using it in a very specific, and in my opinion, limited way, it feels worse than Postgres in every way.
Other data that is ETL’d and might need to update? That sucks.
The problem is, people who created spec is just following trial and error approach, which is insane.
Are they just basically large hash tables?
Sure, we should cut waste, but compression exists for a reason. Dropping valuable observability data to save space is usually shortsighted.
And storage isn't the bottleneck it used to be. Tiered storage with S3 or similar backends is cheap and lets you keep full-fidelity data without breaking the budget.
That's a bit of a blanket statement, too :) I've seen many systems where a lot of stuff is logged without much thought. "Connection to database successful" - does this need to be logged on every connection request? Log level info, warning, debug? Codebases are full of this.
My centrist take is that data can be represented wastefully, which is often ignored.
Most "wide" log formats are implemented... naively. Literally just JSON REST APIs or the equivalent.
Years ago I did some experiments where I captured every single metric Windows Server emits every second.
That's about 15K metrics, down to dozens of metrics per process, per disk, per everything!
There is a poorly documented API for grabbing everything ('*') as a binary blob of a bunch of 64-bit counters. My trick was that I then kept the previous such blob and simply took the binary difference. This set most values to zero, so then a trivial run length encoding (RLE) reduced a few hundred KB to a few hundred bytes. Collect an hour of that, compress, and you can store per-second metrics collected over a month for thousands of servers in a few terabytes. Then you can apply a simple "transpose" transformation to turn this into a bunch of columns and get 1000:1 compression ratios. The data just... crunches down into gigabytes that can be queried and graphed in real time.
I've experimented with Open Telemetry, and its flagrantly wasteful data representations make me depressed.
Why must everything be JSON!?
OTEL can do gRPC and a storage backend can encode that however it wants. However, I do agree it doesn't seem like efficiency was at the forefront when designing OTEL
(one of my immense frustrations with kubernetes - none of the commands for viewing logs seem to accept logical aggregates like "show me everything from this deployment").
[1] https://github.com/stern/stern
k9s (k9scli.io) supports this directly.
Even when it comes to logging in the first place, I have rarely seen developers do it well, instead logging things that make no sense just because it was convenient during development.
But that touches on something else. If your logs are important data, maybe logging is the wrong way to go about it. Instead think about how to clean, refine and persist the data you need like your other application data.
I see log and trace collecting in this way almost as a legacy compatibility thing, analog to how kubernetes and containerization allows you to wrap up any old legacy application process into a uniform format, just collecting all logs and traces is backwards compatible with every application. But in order to not be wasteful and only keep what is valuable, a significant effort would be required afterwards. Well, storage and memory happen to be cheap enough to never have to care about that.
Would you delete a text file that's a few KB from a modern device in order to save space? It just doesn't make any sense.