Dreaming of Graphs in the Open Lakehouse

2 yecol 1 8/22/2025, 8:41:31 AM semyonsinchenko.github.io ↗

Comments (1)

yecol · 5h ago
While modern lakehouse platforms now natively support tables, geospatial data, vectors, and more, property graphs remain a missing piece. With the rise of AI and growing interest in Graph RAG, graphs are becoming increasingly relevant—there’s a clear need to deliver Knowledge Graphs into RAG systems with proper standards, ETL, and frameworks for different use cases.

A young project, Apache GraphAr (incubating), is aiming to define a storage standard. On the processing side, the ecosystem already has strong tooling: GraphFrames (akin to Spark for Iceberg—batch and distributed), Kuzu (akin to DuckDB for Iceberg—fast, in-memory, in-process), and Apache HugeGraph (akin to ClickHouse/Doris for graphs—a standalone server for queries).

There’s also work underway on graphframes-rs, which brings Apache DataFusion and its ecosystem into this landscape. With all these components available, the challenge now is to put the pieces together.