Many companies seem to be using Apache Iceberg, but the ecosystem feels immature outside of Java. For instance, iceberg-rust doesn't even support HDFS. (Though admittedly, Iceberg's tendency to create many small files makes it a poor fit for HDFS anyway.)
amluto · 16m ago
> ALTER TABLE events ADD COLUMN version INT DEFAULT 1;
I’ve always disliked this approach. It conflates two things: the value to put in preexisting rows and the default going forward. I often want to add a column, backfill it, and not have a default.
Fortunately, the Iceberg spec at least got this right under the hood. There’s “initial-default”, which is the value implicitly inserted in rows that predate the addition of the column, and there’s “write-default”, which is the default for new rows.
hodgesrm · 1h ago
This Google article was nice as a high level overview of Iceberg V3. I wish that the V3 spec (and Iceberg specs in general) were more readable. For now the best approach seems to be read the Javadoc for the Iceberg Java API. [0]
Cool to see Iceberg getting these kinds of upgrades. Deletion vectors and default column values sound like real quality-of-life improvements, especially for big, messy datasets. Curious to hear if anyone’s tried V3 in production yet and what the performance looks like.
talatuyarer · 1h ago
This new version has some great new features, including deletion vectors for more efficient transactions and default column values to make schema evolution a breeze. The full article has all the details.
I’ve always disliked this approach. It conflates two things: the value to put in preexisting rows and the default going forward. I often want to add a column, backfill it, and not have a default.
Fortunately, the Iceberg spec at least got this right under the hood. There’s “initial-default”, which is the value implicitly inserted in rows that predate the addition of the column, and there’s “write-default”, which is the default for new rows.
[0] https://javadoc.io/doc/org.apache.iceberg/iceberg-api/latest...
https://github.com/delta-io/delta/blob/master/PROTOCOL.md