101x Airbyte, 11x Estuary, Postgres to Iceberg
We wanted to share the results, as they show OLake performing very competitively, often exceeding the speed of both open-source and commercial alternatives, while offering the cost advantages of a self-hosted open-source solution.
The benchmarks covered both full initial loads and Change Data Capture (CDC) on a large dataset (billions of rows for full load, tens of millions of changes for CDC) over a 24-hour window.
Link to entire benchmark postgres - https://olake.io/docs/connectors/postgres/benchmarks
For full loads, OLake achieved throughput of around 46,262 rows/sec, processing over 4 billion rows in 24 hours.
This was essentially on par with Fivetran (46,395 RPS) and significantly faster than Debezium (14,839 RPS - 3.1x slower), Estuary (3,982 RPS - 11.6x slower on a smaller processed dataset), and Airbyte (457 RPS - 101x slower before it failed the long test).
The most striking results were in CDC performance.
For processing 50 million changes, OLake completed the task in 22.5 minutes at 36,982 rows/sec. Fivetran took 31 minutes (1.4x slower), Debezium took 60 minutes (2.7x slower), Estuary took 4.5 hours (12x slower), and Airbyte took 23 hours (63x slower).
This indicates OLake delivers significantly lower latency for propagating changes from PostgreSQL to Iceberg.
On the cost side, OLake is open source and self-hosted. The cost is simply the infrastructure. Running the benchmarks on a substantial VM (64 vcpus, 128 GiB memory) for 24 hours cost less than $75.
Comparing this to the vendor list prices for the data synced in the tests: Fivetran's full load cost $7,446 ($1.86/M rows), Estuary's full load cost $4,462 ($12.97/M rows), Airbyte Cloud's partial full load cost $5,560 ($438.8/M rows).
For CDC, Fivetran cost $2,257 ($45.14/M rows), Estuary cost $22.72 ($0.45/M rows), and Airbyte Cloud cost $148.95 ($2.98/M rows).
While Estuary shows a low per-row cost for CDC in this specific test, the overall picture strongly favors the predictable, infra-based cost of self-hosted OLake, especially for large-scale replication.
In summary, these benchmarks suggest OLake can match or exceed the speed of leading proprietary tools for PostgreSQL to Iceberg replication, offers superior CDC latency compared to all tested alternatives, and provides a significantly lower and more predictable cost structure due to being open source and self-hosted.
You can find more details on the benchmarks and the tool itself in our documentation.
Happy to discuss the results and our approach.
No comments yet