Better support for real-time stream data analysis has become a new trend in the Kafka world.
We've noticed a clear trend in the Kafka ecosystem toward integrating streaming data directly with data lake formats like Apache Iceberg.
What is your opinion on this matter?
otter-in-a-suit · 5h ago
Seems interesting. The article is a bit light on details around producers and schema management - how would this look like for existing protobufs going to Kafka? How does it handle types that might differ between proto and Iceberg or evolutions that are invalid in Iceberg, but are valid in proto (`oneof` comes to mind)?
fwiw, I've hand-written pretty much exactly this - proto on Kafka to Iceberg via Flink with dynamic source schemas - and things like schema evolution are a nightmare.
We've noticed a clear trend in the Kafka ecosystem toward integrating streaming data directly with data lake formats like Apache Iceberg.
What is your opinion on this matter?
fwiw, I've hand-written pretty much exactly this - proto on Kafka to Iceberg via Flink with dynamic source schemas - and things like schema evolution are a nightmare.