I was almost going to build a lakehouse* with DuckDB because I low-key love it, easiest and strongest analytical engine I've found yet: scale from laptops to big metal, while being mostly out-of-core when doing sane stuff, and avoiding distributed computing for SQL in the process (looking at you Spark).
That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.
The flight extension is excellent as it removes the need to write C++ extensions and lets you use your favorite language to develop native DuckDB catalogs. It's straightforward to build data lake connectors and plug them in as a flight catalog, thanks to Airport!
That sounds less "not accepted" and more "will implement, rewrite required". It was only a couple months ago.
sukhavati · 46m ago
same here man, ended up going with trino explicitly for writing and data management and using chdb/duckdb to process data for front-ends etc (mostly ethereum data so chdb "support" for ui256 is quite important)
jeadie · 10h ago
This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai
anentropic · 48m ago
That looks like an amazing "swiss army knife"...!
mrbungie · 9h ago
Looks very cool! I will take a look, tysm!
mritchie712 · 10h ago
it's coming. they already have hive style parquet writes. Iceberg is more complicated than that, but it's certainly doable.
mrbungie · 10h ago
Yeah, it just would be great if it already did so and I hope it supports Iceberg soon, as it would enable me to change expensive (and bad) engines like AWS Athena for something more manageable.
Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.
r3tr0 · 9h ago
I love duck db. We use it a ton for indexing and organizing system / kernel level metrics exported by eBPF.
This is a cool thought exercise to think that everything that we do in the data world can be done in SQL, from SQL. In a sense this is the MCPs but for the DuckDB world.
rustyconover · 1h ago
Thanks for taking the time to understand the philosophy of the extension.
k_bx · 6h ago
Not clear. Will this allow loading ipc files in DuckDB finally? That's been my biggest issue, since I use IPC files for append operations before I turn them into parquet files.
rustyconover · 1h ago
That’s possible with the arrow extension today.
vkaku · 5h ago
This is very nice. I also love the fuzzycomplete and lindel from the same org/authors.
code_biologist · 5h ago
fuzzycomplete - https://github.com/Query-farm/fuzzycomplete "This fuzzycomplete extension serves as an alternative to DuckDB's autocomplete extension, with several key differences: ..."
lindel - https://github.com/Query-farm/lindel "This lindel extension adds functions for the linearization and delinearization of numeric arrays in DuckDB. It allows you to order multi-dimensional data using space-filling curves. ... Linearization maps multi-dimensional data into a one-dimensional sequence while preserving locality, enhancing the efficiency of data structures and algorithms for spatial data, such as in databases, GIS, and memory caches."
rustyconover · 1h ago
Thanks for the compliments!
rubenvanwyk · 8h ago
Does this mean the data source and destination both have to set up flight servers? I imagine then this won’t be useful for integration of third-party services.
What’s the situation where this is useful? Seems like ‘replace your remote duckDB instance—used to replace a DB server—with duckDB instance + a flight server (or a bunch of them!)’. Who has a problem for which this is the solution?
simlevesque · 9h ago
A Flight server paired with duckdb is a good way to get concurrent writes.
That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.
[1] https://github.com/duckdb/duckdb_iceberg/issues/37
*that is what they are called now aren't they? I just can't follow the terms anymore haha.
The flight extension is excellent as it removes the need to write C++ extensions and lets you use your favorite language to develop native DuckDB catalogs. It's straightforward to build data lake connectors and plug them in as a flight catalog, thanks to Airport!
Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.
Check out our sandbox:
https://yeet.cx/play
lindel - https://github.com/Query-farm/lindel "This lindel extension adds functions for the linearization and delinearization of numeric arrays in DuckDB. It allows you to order multi-dimensional data using space-filling curves. ... Linearization maps multi-dimensional data into a one-dimensional sequence while preserving locality, enhancing the efficiency of data structures and algorithms for spatial data, such as in databases, GIS, and memory caches."