Apache Iceberg V3 Spec new features for more efficient and flexible data lakes

40 talatuyarer 6 8/11/2025, 5:07:34 PM opensource.googleblog.com ↗

Comments (6)

drivenextfunc · 3m ago
Many companies seem to be using Apache Iceberg, but the ecosystem feels immature outside of Java. For instance, iceberg-rust doesn't even support HDFS. (Though admittedly, Iceberg's tendency to create many small files makes it a poor fit for HDFS anyway.)
amluto · 16m ago
> ALTER TABLE events ADD COLUMN version INT DEFAULT 1;

I’ve always disliked this approach. It conflates two things: the value to put in preexisting rows and the default going forward. I often want to add a column, backfill it, and not have a default.

Fortunately, the Iceberg spec at least got this right under the hood. There’s “initial-default”, which is the value implicitly inserted in rows that predate the addition of the column, and there’s “write-default”, which is the default for new rows.

hodgesrm · 1h ago
This Google article was nice as a high level overview of Iceberg V3. I wish that the V3 spec (and Iceberg specs in general) were more readable. For now the best approach seems to be read the Javadoc for the Iceberg Java API. [0]

[0] https://javadoc.io/doc/org.apache.iceberg/iceberg-api/latest...

twoodfin · 2m ago
The Iceberg spec is a model of clarity and simplicity compared to the (constantly in flux via Databricks commits…) Delta protocol spec:

https://github.com/delta-io/delta/blob/master/PROTOCOL.md

ahmetburhan · 44m ago
Cool to see Iceberg getting these kinds of upgrades. Deletion vectors and default column values sound like real quality-of-life improvements, especially for big, messy datasets. Curious to hear if anyone’s tried V3 in production yet and what the performance looks like.
talatuyarer · 1h ago
This new version has some great new features, including deletion vectors for more efficient transactions and default column values to make schema evolution a breeze. The full article has all the details.