I am a big fan of cloudflare blogs. The tech blogs are usually highly detailed and there is so much to learn from those.
But this one, interesting but was not a practical choice at all from what I gather reading the blog. The reason given for not using Clickhouse which they are already using for analytics was vague and ambiguous. Clickhouse does support JSON which can be re-written into a more structured table using MV. Aggregation and other performance tuning steps are bread and butter of using Clickhouse.
The decision to go with postgres and learn the already known limitations the hard way and then continue using it by bringing up a new technology (Timescale) does not sound good, assuming that Cloudflare at this point might already have lots of internal tools for monitoring clickhouse clusters.
akulkarni · 3h ago
That's interesting. Personally I did not find it vague and ambiguous.
ClickHouse was fast but required a lot of extra pieces for it to work:
Writing data to Clickhouse
Your service must generate logs in a clear format, using Cap'n Proto or Protocol Buffers. Logs should be written to a socket for logfwdr to transport to PDX, then to a Kafka topic. Use a Concept:Inserter to read from Kafka, batching data to achieve a write rate of less than one batch per second.
Oh. That’s a lot. Including ClickHouse and the WARP client, we’re looking at five boxes to be added to the system diagram.
So it became clear that ClickHouse is a sports car and to get value out of it we had to bring it to a race track, shift into high gear, and drive it at top speed. But we didn’t need a race car — we needed a daily driver for short trips to a grocery store. For our initial launch, we didn’t need millions of inserts per second. We needed something easy to set up, reliable, familiar, and good enough to get us to market. A colleague suggested we just use PostgreSQL, quoting “it can be cranked up” to handle the load we were expecting. So, we took the leap!
PostgreSQL with TimescaleDB did the job. Why overcomplicate things?
But this one, interesting but was not a practical choice at all from what I gather reading the blog. The reason given for not using Clickhouse which they are already using for analytics was vague and ambiguous. Clickhouse does support JSON which can be re-written into a more structured table using MV. Aggregation and other performance tuning steps are bread and butter of using Clickhouse.
The decision to go with postgres and learn the already known limitations the hard way and then continue using it by bringing up a new technology (Timescale) does not sound good, assuming that Cloudflare at this point might already have lots of internal tools for monitoring clickhouse clusters.
ClickHouse was fast but required a lot of extra pieces for it to work:
PostgreSQL with TimescaleDB did the job. Why overcomplicate things?