How well does the expression detection scale with number of columns? If I am reading Table 4 correctly FastLanes is ~10x slower at encoding than parquet+snappy (which seems a reasonable tradeoff for the better compression and scan times) but how is that affected for very wide tables (eg 2k columns or something like that)
azimafroozeh · 1d ago
That’s a very valid question. We’ve done zero optimization on the encoding side so far, and improving that is definitely on our roadmap. Technically, once we learn the best expressions, they can be reused — data is often very similar across row groups — which opens the door to caching and amortizing the cost.
For very wide tables, expression detection only needs to happen once. Beyond that, we’re also exploring techniques like grouping columns into smaller sets or applying more aggressive heuristics to prune irrelevant columns. These are areas we’re actively investigating, and we plan to support them in future versions of FastLanes.
falcor84 · 1d ago
How does it compare to Arrow?
azimafroozeh · 1d ago
Arrow is primarily an in-memory format, while Parquet is commonly used for on-disk storage. Typically, data is stored in Parquet and then read into memory as Arrow.
FastLanes is a new on-disk file format, comparable to Parquet, but offers around 40% better compression and faster decoding thanks to its data-parallel encoding design.
Feather appears to just be block compressed Arrow IPC [1]. Lightweight compression techniques generally achieve two orders of magnitude faster random access compared to block compression. That’s one of the benefits of formats like FastLanes, Vortex, DuckDB native, etc. DuckDB has a good blog post about it here: https://duckdb.org/2022/10/28/lightweight-compression.html
My experience with parquet is that the (C++) libraries are pretty awful and resulting performance lackluster
azimafroozeh · 1d ago
I totally agree — the C++ implementations of Parquet aren’t the best they could be. To be fair, though, it’s a tough problem: Parquet supports a broad range of encodings and compression schemes, and to do that, it pulls in a lot of dependencies and requires complex dependency management.
That’s one of the main reasons we built FastLanes from scratch instead of trying to integrate with Parquet.
With FastLanes, we’ve taken a different approach: zero dependencies, no SIMD intrinsics, and a design that’s fully auto-vectorizable. The result is simpler code that still delivers high performance.
timhigins · 1d ago
How well does this work with partial downloading of a file based on column statistics via HTTP Range requests?
Any plans to integrate FastLanes into DuckDB? You would instantly get more usage from its SDKs in many languages and the CLI.
azimafroozeh · 1d ago
We’d love to bring FastLanes into DuckDB! We're currently working on a DuckDB extension to read and write FastLanes file formats directly from DuckDB.
FastLanes is columnar by design, so partial downloading via HTTP Range requests is definitely possible — though not yet implemented. It’s on the roadmap.
Vortex borrows a few ideas from the FastLanes project, such as bit-packing and ALP. However, it’s unclear how well these are implemented — their performance on ClickBench appears worse than Parquet in both storage size and decompression speed, which is counterintuitive.
Technically, Vortex is documented more like a BtrBlocks-style format, which we’ve benchmarked and compared against in depth.
_willmanning · 1d ago
I'll chime in (as a Vortex maintainer), that we are greatly indebted to Azim's work on FastLanes & ALP. Vortex heavily utilizes his work to get state-of-the-art performance.
I would add that Vortex doesn't have standalone Clickbench results. Azim is presumably referring to the duckdb-vortex results, which were run on an older version of duckdb (1.2) than the duckdb-parquet ones (1.3). We'll get those updated shortly; we just released a new version of Vortex & the duckdb extension. Meanwhile, I believe the DataFusion-Vortex vs DataFusion-Parquet speedups show substantial improvements across the board.
The folks over at TUM (who originally authored BtrBlocks) did a reasonable amount of micro-benchmarking of Vortex vs Parquet in their recent "Anyblox" paper for VLDB 2025: https://gienieczko.com/anyblox-paper
They essentially say in the paper that Vortex is much faster than the original BtrBlocks because it uses better encodings (specifically citing FastLanes & ALP).
I'm looking forward to seeing the FastLanes Clickbench results when they're ready, and Azim, we should work together to benchmark FastLanes against Vortex!
azimafroozeh · 23h ago
As we discuss in the FastLanes paper, the way BtrBlocks implements cascaded encodings (Vortex now) is essentially a return to block-based compression such as Zstd — which we're trying to avoid as much as possible. This design doesn't work well with modern vectorized execution engines or GPUs: the decompression granularity is too large to fit in CPU caches or GPU shared memory. So Vortex ends up being yet another Parquet-like file format, repeating the same mistakes. And if it still underperforms compared to Parquet... what’s the point?
We just released FastLanes v0.1, and more results — including ClickBench — are coming soon.
Please do benchmark FastLanes — and keep us posted!
Mindjolt · 1d ago
Is there a place where we can contribute?
azimafroozeh · 23h ago
We’re building a community around FastLanes, and we’d love any contributions or feedback!
Please join our Discord so we can discuss things in more detail.
timhigins · 1d ago
Any comparisons with Lance/LanceDB?
azimafroozeh · 1d ago
We haven’t benchmarked FastLanes directly against LanceDB yet, but here’s a quick look at the compression side:
LanceDB supports:
FSST
Bit-packing
Delta encoding
Opaque block codecs: GZIP, LZ4, Snappy, ZLIB
So in that regard, it’s quite similar to Parquet — a mix of lightweight codecs and general-purpose block compression.
FastLanes, on the other hand, introduces Expression Encoding — a unified compression model that allows combining lightweight encodings to achieve better compression ratios. It also integrates multiple research efforts from CWI into a single file format:
Lance contributor here. This sounds about right. We haven't really innovated too much in the compression space. Most of our efforts have been around getting rid of row groups and the resulting changes in decoding patterns.
Our current approach is pretty similar to Parquet for scalar types. We allow a mix of general and lightweight codecs for small types and require lightweight only codecs for larger types (string, binary).
Nice work on the paper :)
k__ · 1d ago
Pretty cool!
WebAssembly planned?
azimafroozeh · 1d ago
CUDA and GPU support are next on our list — but we’re definitely interested in WebAssembly as well!
For very wide tables, expression detection only needs to happen once. Beyond that, we’re also exploring techniques like grouping columns into smaller sets or applying more aggressive heuristics to prune irrelevant columns. These are areas we’re actively investigating, and we plan to support them in future versions of FastLanes.
Disclaimer: I'm the first author of the paper.
This is referenced in the link above. https://arrow.apache.org/docs/python/ipc.html
Unfortunately I'm stuck with CSV at work for now.
[1]: https://arrow.apache.org/docs/python/feather.html
That’s one of the main reasons we built FastLanes from scratch instead of trying to integrate with Parquet.
With FastLanes, we’ve taken a different approach: zero dependencies, no SIMD intrinsics, and a design that’s fully auto-vectorizable. The result is simpler code that still delivers high performance.
Any plans to integrate FastLanes into DuckDB? You would instantly get more usage from its SDKs in many languages and the CLI.
FastLanes is columnar by design, so partial downloading via HTTP Range requests is definitely possible — though not yet implemented. It’s on the roadmap.
Technically, Vortex is documented more like a BtrBlocks-style format, which we’ve benchmarked and compared against in depth.
I would add that Vortex doesn't have standalone Clickbench results. Azim is presumably referring to the duckdb-vortex results, which were run on an older version of duckdb (1.2) than the duckdb-parquet ones (1.3). We'll get those updated shortly; we just released a new version of Vortex & the duckdb extension. Meanwhile, I believe the DataFusion-Vortex vs DataFusion-Parquet speedups show substantial improvements across the board.
The folks over at TUM (who originally authored BtrBlocks) did a reasonable amount of micro-benchmarking of Vortex vs Parquet in their recent "Anyblox" paper for VLDB 2025: https://gienieczko.com/anyblox-paper
They essentially say in the paper that Vortex is much faster than the original BtrBlocks because it uses better encodings (specifically citing FastLanes & ALP).
I'm looking forward to seeing the FastLanes Clickbench results when they're ready, and Azim, we should work together to benchmark FastLanes against Vortex!
We just released FastLanes v0.1, and more results — including ClickBench — are coming soon. Please do benchmark FastLanes — and keep us posted!
LanceDB supports:
FSST
Bit-packing
Delta encoding
Opaque block codecs: GZIP, LZ4, Snappy, ZLIB
So in that regard, it’s quite similar to Parquet — a mix of lightweight codecs and general-purpose block compression.
FastLanes, on the other hand, introduces Expression Encoding — a unified compression model that allows combining lightweight encodings to achieve better compression ratios. It also integrates multiple research efforts from CWI into a single file format:
The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code (VLDB '23) PDF: https://dl.acm.org/doi/pdf/10.14778/3598581.3598587
ALP (Adaptive Lossless Floating-Point Compression) — SIGMOD '24 https://ir.cwi.nl/pub/33334/33334.pdf
G‑ALP (GPU-parallel variant of ALP) — DaMoN '25 https://azimafroozeh.org/assets/papers/g-alp.pdf
White-box Compression (self-describing, function-based) — CIDR '20 https://www.cidrdb.org/cidr2020/papers/p4-ghita-cidr20.pdf
CCC (Exploiting Column Correlations for Compression) — MSc Thesis '23 https://homepages.cwi.nl/~boncz/msc/2023-ThomasGlas.pdf
Our current approach is pretty similar to Parquet for scalar types. We allow a mix of general and lightweight codecs for small types and require lightweight only codecs for larger types (string, binary).
Nice work on the paper :)
WebAssembly planned?