Ask HN: Best way to approach user-facing analytics
3 superarch 4 3/27/2025, 10:47:57 PM
I’m looking for advice on technical strategies for user-facing analytics features (think real-time customizable dashboard). I’ve ended up in a situation where I want to design some analytics dashboards for end users that display aggregates/historical data for a number of years. The queries I want to run take way too long to run on-the-fly so I’ve resorted to pre-computing the data I care about on a schedule. This approach works fine but it’s not real time and it adds some friction to introducing new features/views (every new feature requires backfilling years worth of data). I’m curious if others have better strategies for developing analytics software like this that’s real-time, responsive, and quick to iterate on? OLAP DBs like clickhouse? Reverse ETL? Better OLTP indexing?
If replica server cannot execute queries fast enough, a specialized (optimized for OLAP-workload) database should be used instead. This can be a cloud service (like BigQuery, Redshift, Snowflake, Motherduck) or self-hosted solutions (ClickHouse, PostgreSql with pg_analytics extension, or even in-process DuckDB). Data sync is performed either with scheduled full-copy (simple, but not suitable for near real-time analytics) or via CDC (see Airbyte).
1) Read replicas with copied data. The most straightforward, allowing using the same SQL syntax and tooling. Examples: Postgres read replica and BemiDB (disclaimer: I'm a contributor)
2) Operational databases with integrations. Designed for sub-second real-time, bring their own extended SQL syntax for things like window functions. Examples: Materialize and RisingWave
3) Analytical databases with syncing. Allow writing and reading directly, optimized for analytical workloads. Examples: ClickHouse and DuckDB
4) Data warehouses with ETL. Great for large volumes of data, traditionally used with ETL batch processing. Examples: Snowflake and Redshift