Self-hosting your own media considered harmful according to YouTube (jeffgeerling.com)

Hey HN! Mike & Warren here from HyperDX (now part of ClickHouse)! We’ve been building ClickStack, an open source observability stack that helps you collect, centralize, search/viz/alert on your telemetry (logs, metrics, traces) in just a few minutes - all powered by ClickHouse (Apache2) for storage, HyperDX (MIT) for visualization and OpenTelemetry (Apache2) for ingestion.

You can check out the quick start for spinning things up in the repo here: https://github.com/hyperdxio/hyperdx

ClickStack makes it really easy to instrument your application so you can go from bug reports of “my checkout didn’t go through” to a session replay of the user, backend API calls, to DB queries and infrastructure metrics related to that specific request in a single view.

For those that might be migrating from Very Expensive Observability Vendor (TM) to something open source, more performant, and doesn’t require extensive culling of retention limits and sampling rates - ClickStack gives a batteries-included way of starting that migration journey.

For those that aren’t familiar with ClickHouse, it’s a high performance database that has already been used by companies such as Anthropic, Cloudflare, and DoorDash to power their core observability at scale due to its flexibility, ease of use, and cost effectiveness. However, this required teams to dedicate engineers to building a custom observability stack, where it’s difficult to not only get their telemetry data easily into ClickHouse but also struggling without a native UI experience.

That’s why we’re building ClickStack - we wanted to bundle an easy way to get started ingesting your telemetry data whether it’s logs & traces from Node.js or Ruby to metrics from Kubernetes or your bare metal infrastructure. Just as important we wanted our users to enjoy a visualization experience that allowed users to quickly search using a familiar lucene-like search syntax (similar to what you’d use in Google!). We recognise though, that a SQL mode is needed for the most complex of queries. We've also added high cardinality outlier analysis by charting the delta between outlier and inlier events - which we've found really helpful in narrowing down causes of regressions/anomalies in our traces as well as log patterns to condense down clusters of similar logs.

We’re really excited about the roadmap ahead in terms of improving ClickStack as a product and the ClickHouse core database to improve observability. Would love to hear everyone’s feedback and what they think!

Spinning up a container is pretty simple: `docker run -p 8080:8080 -p 4317:4317 -p 4318:4318 docker.hyperdx.io/hyperdx/hyperdx-all-in-one` In browser live demo (no sign ups or anything silly, it runs fully in your browser!): https://play.hyperdx.io/ Landing Page: https://clickhouse.com/o11y Github Repo: https://github.com/hyperdxio/hyperdx Discord community: https://hyperdx.io/discord Docs: https://clickhouse.com/docs/use-cases/observability/clicksta...

Comments (46)

theogravity · 1h ago

This is really cool considering how expensive DataDog can get. I'm the author of LogLayer (https://loglayer.dev), which is a structured logger for TypeScript that allows you to use multiple loggers together. I've written transports that allows shipping to other loggers like pino and cloud providers such as DataDog.

I spent some time writing an integration for HyperDX after seeing this post and hope you can help me roll it out! Would love to add a new "integrations" section to my page that links to the docs on how to use HyperDX with LogLayer.

https://github.com/hyperdxio/hyperdx-js/pull/184

readdit · 10h ago

I like and use HyperDX in production and like it a lot. So kudos to the team for building and merging with Clickhouse. I found a lot of monetary value switching over to HyperDX considering it's significantly more cost efficient for our needs.

Should we be starting to prepare for the original HyperDX product to be deprecated and potentially move over to ClickStack?

mikeshi42 · 8h ago

First off, always really excited to hear from our production users - glad to hear you're getting good value out of the platform!

HyperDX isn't being deprecated, you can probably see on the marketing page it's still really prominently featured as an integral part of the stack - so nothing changing there.

We do of course want to get users onto HyperDX v2 and the overall ClickStack pattern. This doesn't mean HyperDX is going away by any means - just that HyperDX is focused a lot more on the end-user experience, and we get to leverage the flexibility, learnings and performance of a more exposed ClickHouse-powered core which is the intent of ClickStack. On the engineering side, we're working on making sure it's a smooth path for both open source and cloud.

side note: weird I thought I replied to this one already but I've been dealing with spotty wifi today :)

HatchedLake721 · 5h ago

Still confused where HyperDX ends and where ClickStack starts.

Is HyperDX === ClickStack?

Is ClickStack = HyperDX + something closed source?

Is ClickStack just a cloud version of HyperDX?

Is it same thing, HyperDX, rebranded as ClickStack?

mikeshi42 · 3h ago

This is good feedback to make things more clear :) HyperDX is part of ClickStack, so ClickStack = { HyperDX, ClickHouse, OTel }. This is the stack we recommend that will deploy in seconds and _just work_, and can scale up to PB+ and beyond as well with some additional effort (more than a few seconds unfortunately, but one day...)

HyperDX v2, the version that is now stable and shipped in ClickStack, focuses more on the querying layer. It lets users have more customization around ClickHouse (virtually any schema, any deployment).

Optionally, users can leverage other ways of getting data into ClickHouse like Vector, S3, etc. but still use HyperDX v2 on top. Previously in HyperDX v1 you _had_ to use OTel and our ingestion pipeline and our schemas. This is no longer true in v2.

Let me know if this explanation helps

gigatexal · 55m ago

Datadog is expensive this is true. But I have never felt it be slow. Speed is not its killer feature. It’s everything you can do with it once you have logs and or metrics flowing into it.

The dashboards and their creation are intuitive. Creating alerts and things from airflow logs is easy using their DSL. Connecting and sending notifications to things like slack just works tm.

So this is how we justify the datadog costs because of all the engineering time (engineers are still expensive, ai hasn’t replaced us yet) it saves and how quickly we can move from raw logs and metrics to useful insights.

hosh · 10h ago

I liked Otel for traces and maybe logging -- but I think the Otel metrics is over-engineered.

Does ClickStack have a way to ingest statsd data, preferably with Datadog extensions (which adds tagging)?

Does ClickStack offer correlations across traces, logging, and metrics via unified service tagging? Does the UI offer the ability to link to related traces, logging, and metrics?

Why does the Elixir sdk use the hyperdx library instead of the otel library?

Are Notebooks in the roadmap?

phillipcarter · 10h ago

> but I think the Otel metrics is over-engineered.

What about OTel metrics is difficult?

You can set up receivers for other metrics sources like stasd or even the DD agent, so there's no need to immediately replace your metrics stack.

carefulfungi · 9h ago

My foray into otel with aws lambda was not a success (about 6 months ago). Many of my issues were with the prom remote writer that I had to use. The extension was not reliable. Queue errors were common in the remote writer. Interop with Prometheus labels was bad. And the various config around delta and non-delta metrics was a bit of a mess. The stack I was using at least didn’t support exponential histograms. Got it to work mostly after days of fiddling but never reliably. Ripped it out and was happier. Maybe a pure OTEL stack would have been a much better experience than needing the prom remote writer - which I’d like to try in the future.

I’d certainly appreciate hearing success stories of OTEL + serverless.

cyberax · 5h ago

One critical problem for me: no support for raw metrics.

Sometimes, you just want to export ALL of your metrics to the server and let it deal with histograms, aggregation, etc.

Another annoyance is the API, you can't just put "metrics.AddMeasurement('my_metric', 11)", you have to create a `Meter` (which also requires a library name), and then use it.

mikeshi42 · 8h ago

Great questions!

OTel Metrics: I get it, it's specified as almost a superset of everyone's favorite metric standards with config for push/pull, monotonic vs delta, exponential/"native" histograms, etc. I have my preferences as well which would be a subset of the standard but I get why a unifying standard needed to be flexible.

Statsd: The great thing about the OTel collector is that it allows ingesting a variety of different data formats, so you can take in statsd and output OTel or write directly to ClickHouse: https://github.com/open-telemetry/opentelemetry-collector-co...

We correlate across trace/span id as well as resource attributes. The correlation across logs/traces with span/trace id is a pretty well worn path across our product. Metrics to the rest is natively done via resource attributes and we primarily expose correlation for K8s-based workloads with more to come. We don't do exemplars _yet_ to solve the more generic correlation case for metrics (though I don't think statsd can transmit exemplars)

Elixir: We try to do our best to support wherever our users are, the OTel SDK and ours have continued to change in parallel over time - we'll want to likely re-evaluate if we should start pointing towards the base OTel SDK for Elixir. We've been pretty early on the OTel SDK side across the board so things continue to evolve, for example our Deno OTel integration came out I think over a year before Deno officially launched one with native HyperDX documentation <3

Notebooks: Yes, it should land in an experimental state shortly, stay tuned :) There's a lot of exciting workflows we're looking to unlock with notebooks as well. If you have any thoughts in this direction, please let me know. I'd love to get more user input ahead of the first release.

hosh · 2h ago

Thank you. I saw a different thread about Otel statsd receiver, so that works out better. The last time I had looked into it, the otel metrics specs were very complex.

I think this is enough features for me to seriously take a look at it as a Datadog alternative.

atombender · 7h ago

I'm looking for a new logging solution to replace Kibana. I have very good experience with ClickHouse, and HyperDX looks like a decent UI for it.

I'm primarily interested in logs, though, and the existing log shipping pipeline is around Vector on Kubernetes. Admittedly Vector has an OTel sink in beta, but I'm curious if that's the best/fastest way to ship logs, especially given that the original data comes out of apps as plain JSON rather than OTel.

The current system is processing several TB/day and needs fairly serious throughput to keep up.

mikeshi42 · 6h ago

Luckily ClickHouse and serious throughput are pretty synonymous. Internally we're at 100+PB of telemetry stored in our own monitoring system.

Vector supports directly writing into ClickHouse - several companies use this at scale (iirc Anthropic does exactly this, they spoke about this recently at our user conference).

Please give it a try and let us know how it goes! Happy to help :)

atombender · 6h ago

Thanks! Very familiar with ClickHouse, but can logs then be ingested into CH without going through HyperDX? Doesn't HyperDX require a specific schema that the Vector pipeline would have to adapt the payloads to?

mikeshi42 · 6h ago

Nope! We're virtually schema agnostic, you can map your custom schema to observability concepts (ex. the SQL expression for TraceID, either a column or a full function/expression will work).

We don't have any lock in to our ingestion pipeline or schema. Of course we optimize a lot for the OTel path, but it works perfectly fine without it too.

atombender · 6h ago

That's great to hear. I will take a closer look ASAP.

codegeek · 10h ago

How are you different than Signoz, another YC company that also does Observability using clickhouse ?

mikeshi42 · 6h ago

Echoing the comment below, I guess one obvious thing is that we are a team at ClickHouse and an official first-party product on top. That translates into:

- We're flexible on top of any ClickHouse instance, you can use virtually any schema in ClickHouse and things will still work. Custom schemas are pretty important for either tuned high performance or once you're at a scale like Anthropic. This makes it also incredibly easy to get started (especially if you already have data in ClickHouse). - The above also means you don't need to buy into OTel. I love OTel but some companies choose to use Vector, Cribl, S3, a custom writing script, etc for good reasons. All of that is supported natively due to the various ClickHouse integrations, and naturally means you can use ClickStack/HyperDX in that scenario as well. - We also have some cool tools around wrangling telemetry at scale, from Event Deltas (high cardinality correlation between slow spans and normal spans to root cause issues) to Event Patterns (clustering similar logs or spans together automatically with ML) - all of these help users dive into their data in easier ways than just searching & charting. - We also have session replay capability - to truly unify everything from click to infra metrics.

We're built to work at the 100PB+ scale we run internally here for monitoring ClickHouse Cloud, but flexible enough to pin point specific user issues that get brought up once in a support case in an end-to-end manner.

There's probably a lot more I'm missing. Ultimately from a product philosophy standpoint, we aren't big believers in the "3 pillars" concept, which tends to manifest as 3 silos/tabs for "logs", "metrics", "traces" (this isn't just Signoz - but across the industry). I'm a big believer that we're building tools to unify and centralize signals/clues in one place and giving the right datapoint at the right time to the engineer. During an incident I just think about what's the next clue I can get to root cause an issue, not if I'm in the logging product or the tracing product.

oatsandsugar · 10h ago

"You" here is ClickHouse

bilalq · 10h ago

This is really interesting.

Is Clickhouse the only stateful part of this stack? Would love to see compatbility with Rotel[0], a Rust implementation of the OTEL collector, so that this becomes usable for serverless runtime environments.

One key thing Datadog has is their own proprietary alternative to the OTEL collector that is much more performant.

[0]: https://github.com/streamfold/rotel

mikeshi42 · 9h ago

I agree - rotel seems like a really good fit for a lightweight lambda integration for OTel, it of course should work already since we stand up an OTel ingest endpoint so it should be seamless to send data over! (Kind of the beauty of OTel of course)

I've also been in touch with Mike & Ray for a bit, who've told me they've added ClickHouse support recently which makes the story even better :)

mike_heffner · 7h ago

Hi all — one of the authors of Rotel here. Thanks for the kind words, Bilal and Michael.

We're excited to test our Clickhouse integration with Clickstack, as we believe OTel and Clickhouse make for a powerful observability stack. Our open-source Rust OpenTelemetry collector is designed for high-performance, resource-constrained environments. We'd love for you to check it out!

user3939382 · 10h ago

There’s so many of these log aggregators I’ve completely lost track. I used Datadog extensively and found it overpriced and a very confusing UI.

RhodesianHunter · 9h ago

That's what happens when there's a need for something.

You see an explosion in offerings, and then eventually it's whittled down to a handful of survivors.

landl0rd · 4h ago

Datadog is a good product but one of the most blatantly overpriced things I’ve had the displeasure to use.

secondcoming · 7h ago

Everyone has found Datadog to be overpriced!

So they switch to Prometheus and Grafana and now have to manage a Prometheus cluster. Far cheaper, but far more annoying.

ensignavenger · 6h ago

Really interesting, Unfortunately, it looks like HyperDX depends on Mongo? I wonder if there are any open source document stores (possibly a mongo compatible one)( that could work with it?

ensignavenger · 5h ago

FerretDB looks like a great alternative, thanks! I'll be keeping Ferret and ClickStack on my radar!

mikeshi42 · 6h ago

In theory you should be able to try using FerretDB for example.

We have this on the medium term roadmap to investigate proper support for a compatibilty layer such as ferret or more likely just using ClickHouse itself as the operational data store.

ptrfarkas · 6h ago

FerretDB maintainer here - we'll be looking at this

wrn14897 · 37m ago

Hey, I'm a maintainer of HyperDX. I'd love to chat with you about a potential collaboration. we're planning to migrate off mongodb. Please reach out to me on Discord (warren)

mikeshi42 · 6h ago

That'd be awesome! Ferret has been on my radar for a while now :) If you want to chat with us on Discord: https://hyperdx.io/discord

ah27182 · 3h ago

Do i need to sign-in when using the docker container?

mikeshi42 · 2h ago

There's a version that we call local mode which is intended for engineers using it as part of their local debugging workflow: https://clickhouse.com/docs/use-cases/observability/clicksta...

Otherwise yes you can authenticate against the other versions with a email/password (really the email doesn't do anything in the open source distribution, just a user identifier but we keep there to be consistent)

buserror · 9h ago

I am absolutely amazed at the amount of garbage being "logged", enough that it is not just a huge business, but also one of the primary task for some devops guys. It's like a goal in itself, you have a look at the output and it is absolutely scary, HUGE messages being "logged" for purpose unknown.

I've seen single traces over 100KB of absolute pure randomness encoded as base64... Because! Oh and also, we have to pay for the service, so it looks important.

Sure they tell you it is super helpful for debugging issues, but in a VERY large proportion of cases, it is 1) WAY too much, and 2) never used anyway. And most of the time what's interesting is the last 10 minutes of the debug version, you don't need a "service" for that.

/me gets down his horse :-)

metta2uall · 54m ago

I think you're at least partially right - not everything but a lot of data is not useful - wasting money, bandwidth, electricity, etc. There should be more dynamic controls over what gets logged/filtered at the client-side..

SOLAR_FIELDS · 7h ago

Comparison to the other player in this space, Signoz? Also uses clickhouse as backend

Immortalin · 10h ago

I remember back in the day Mike was building Huggingface before Huggingface was a thing. He was ahead of his time. It's a pity model depot is no longer around.

mikeshi42 · 8h ago

Wow this is an incredible throwback! Can't believe your memory is this good. It's quite funny and I totally agree - I met the Gradio founders in an accelerator (when they were just getting started) after we shut down ModelDepot - and they of course ended up getting acquired into Hugging Face. It's funny how things end up sometimes :)

ksec · 10h ago

It would have even much better if the link was pointing to https://github.com/hyperdxio/hyperdx the actual source code.

Because right now without the message on HN here, I wouldn't know what "open source observability stack" meant when the webpage does not explain what HyperDX is, nor does it provide a link to it or its code. I was expecting the whole thing "Open Source Datadog" to be ClickStack Repo inside Clickhouse Github. Which is not found anywhere.

But other than that congrats!. I have long wondered why no one has built anything on top of Clickhouse for Datadog / New Relic competition.

The Clickhouse DB opened up the ocean of open source "Scalable" Web Analytics that wont previously available or possible. I am hoping we see this change again to observability platform as well.

ankit01-oss · 3h ago

check out SigNoz: https://github.com/SigNoz/signoz

We started building signoz as an OS alternative of Datadog/New Relic four years back and opentelemetry-native from day 1. We have shipped some good features on top of Opentelemetry and because of OTel's semantic conventions & our query builder, you can correlate any telemetry across signals.

mikeshi42 · 10h ago

Hey that's a good point on the link! Not something I can change now unfortunately, I was hoping having it near the top of the text post would help too for those that wanted to dig in more :)

That being said - as you've mentioned so many different "store tons of data" apps have been enabled from ClickHouse. Observability is at a point where it's in the same category of: ClickHouse can store a ton of data, OTel can help you collect/process it, and now we just need that analytics user experience layer to present it to the engineers that need an intuitive way to dive in to it all.

sirfz · 10h ago

SigNoz is a dd/nr alternative built on clickhouse that I know of

cbhl · 8h ago

Looks like it is pointing there now; old link was https://clickhouse.com/use-cases/observability for posterity

Self-hosting your own media considered harmful according to YouTube (jeffgeerling.com)

Tokasaurus: An LLM inference engine for high-throughput workloads (scalingintelligence.stanford.edu)

The impossible predicament of the death newts (crookedtimber.org)

How we’re responding to The NYT’s data demands in order to protect user privacy (openai.com)

Test Postgres in Python Like SQLite (github.com)

Show HN: Claude Composer (github.com)

What a developer needs to know about SCIM (tesseral.com)

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations (masonyarbrough.com)

Defending adverbs exuberantly if conditionally (countercraft.substack.com)

APL Interpreter – An implementation of APL, written in Haskell (2024) (scharenbroch.dev)

Air Lab – A portable and open air quality measuring device (networkedartifacts.com)

Seven Days at the Bin Store (defector.com)

SkyRoof: New Ham Satellite Tracking and SDR Receiver Software (rtl-sdr.com)

Machine Learning: The Native Language of Biology (decodingbiology.substack.com)

X changes its terms to bar training of AI models using its content (techcrunch.com)

Show HN: Lambduck, a Functional Programming Brainfuck (imjakingit.github.io)

Open Source Distilling (opensourcedistilling.com)

A proposal to restrict sites from accessing a users’ local network (github.com)

I made a search engine worse than Elasticsearch (2024) (softwaredoug.com)

I do not remember my life and it's fine (aethermug.com)

The Universal Tech Tree (asteriskmag.com)

Converge (YC S23) Well-capitalized New York startup seeks product developers (runconverge.com)

Programming language Dino and its implementation (github.com)

Show HN: iOS Screen Time from a REST API (thescreentimenetwork.com)

Eleven v3 (elevenlabs.io)

Switch 2 rooted on day 1 (bsky.app)

Show HN: ClickStack – Open-source Datadog alternative by ClickHouse and HyperDX (github.com)

Building an AI Server on a Budget (informationga.in)

How Common Is Multiple Invention? (construction-physics.com)

Autonomous drone defeats human champions in racing first (tudelft.nl)

Show HN: Container Use for Agents (github.com)

Show HN: String Flux – Simplify everyday string transformations for developers (stringflux.io)

Apple Notes Will Gain Markdown Export at WWDC, and, I Have Thoughts (daringfireball.net)

parrot.live (github.com)

LLMs and Elixir: Windfall or Deathblow? (zachdaniel.dev)

Phptop: Simple PHP ressource profiler, safe and useful for production sites (github.com)

From tokens to thoughts: How LLMs and humans trade compression for meaning (arxiv.org)

End of an Era: Landsat 7 Decommissioned After 25 Years of Earth Observation (usgs.gov)

Twitter's new encrypted DMs aren't better than the old ones (mjg59.dreamwidth.org)

Prompt engineering playbook for programmers (addyo.substack.com)

Show HN: I made a 3D SVG Renderer that projects textures without rasterization (seve.blog)

Rare black iceberg spotted off Labrador coast could be 100k years old (cbc.ca)

Anthropic co-founder on cutting access to Windsurf (techcrunch.com)

The iPhone 15 Pro’s Depth Maps (tech.marksblogg.com)

Cursor 1.0 (cursor.com)

A Spiral Structure in the Inner Oort Cloud (iopscience.iop.org)

Data centers are building their own gas power plants in Texas (texastribune.org)

Understanding the PURL Specification (Package URL) (fossa.com)

Aurora, a foundation model for the Earth system (nytimes.com)

FFmpeg merges WebRTC support (git.ffmpeg.org)

Show HN: ClickStack – Open-source Datadog alternative by ClickHouse and HyperDX

Comments (46)