Adobe's pipeline for high‑throughput data ingestion with Apache Iceberg

3 acossta 1 8/29/2025, 10:16:06 PM medium.com ↗

Comments (1)

acossta · 6h ago
Interesting deep dive into how Adobe built a streaming ingestion layer on top of Apache Iceberg to handle massive volumes of Experience Platform data, addressing challenges like the small‑file problem and commit bottlenecks with asynchronous writes and compaction. All stuff I've had to deal with in the past.

Good nuggets on they partition tables by time, stage writes in separate ingestion and reporting tables, and tune snapshot and metadata handling to deliver a lakehouse‑style pipeline that scales without melting the object store.