Vanilla JavaScript support for Tailwind Plus (tailwindcss.com)

How well does the expression detection scale with number of columns? If I am reading Table 4 correctly FastLanes is ~10x slower at encoding than parquet+snappy (which seems a reasonable tradeoff for the better compression and scan times) but how is that affected for very wide tables (eg 2k columns or something like that)

azimafroozeh · 1d ago

That’s a very valid question. We’ve done zero optimization on the encoding side so far, and improving that is definitely on our roadmap. Technically, once we learn the best expressions, they can be reused — data is often very similar across row groups — which opens the door to caching and amortizing the cost.

For very wide tables, expression detection only needs to happen once. Beyond that, we’re also exploring techniques like grouping columns into smaller sets or applying more aggressive heuristics to prune irrelevant columns. These are areas we’re actively investigating, and we plan to support them in future versions of FastLanes.

falcor84 · 1d ago

How does it compare to Arrow?

azimafroozeh · 1d ago

Arrow is primarily an in-memory format, while Parquet is commonly used for on-disk storage. Typically, data is stored in Parquet and then read into memory as Arrow. FastLanes is a new on-disk file format, comparable to Parquet, but offers around 40% better compression and faster decoding thanks to its data-parallel encoding design.

Disclaimer: I'm the first author of the paper.

abirch · 1d ago

What about Feather? This is on my to do list, but I thought that Feather was a file format based on Arrow: https://docs.pola.rs/api/python/stable/reference/api/polars....

This is referenced in the link above. https://arrow.apache.org/docs/python/ipc.html

Unfortunately I'm stuck with CSV at work for now.

danking00 · 1d ago

Feather appears to just be block compressed Arrow IPC [1]. Lightweight compression techniques generally achieve two orders of magnitude faster random access compared to block compression. That’s one of the benefits of formats like FastLanes, Vortex, DuckDB native, etc. DuckDB has a good blog post about it here: https://duckdb.org/2022/10/28/lightweight-compression.html

[1]: https://arrow.apache.org/docs/python/feather.html

nly · 1d ago

My experience with parquet is that the (C++) libraries are pretty awful and resulting performance lackluster

azimafroozeh · 1d ago

I totally agree — the C++ implementations of Parquet aren’t the best they could be. To be fair, though, it’s a tough problem: Parquet supports a broad range of encodings and compression schemes, and to do that, it pulls in a lot of dependencies and requires complex dependency management.

That’s one of the main reasons we built FastLanes from scratch instead of trying to integrate with Parquet.

With FastLanes, we’ve taken a different approach: zero dependencies, no SIMD intrinsics, and a design that’s fully auto-vectorizable. The result is simpler code that still delivers high performance.

timhigins · 1d ago

How well does this work with partial downloading of a file based on column statistics via HTTP Range requests?

Any plans to integrate FastLanes into DuckDB? You would instantly get more usage from its SDKs in many languages and the CLI.

azimafroozeh · 1d ago

We’d love to bring FastLanes into DuckDB! We're currently working on a DuckDB extension to read and write FastLanes file formats directly from DuckDB.

FastLanes is columnar by design, so partial downloading via HTTP Range requests is definitely possible — though not yet implemented. It’s on the roadmap.

jmakov · 1d ago

Any comparison with https://github.com/vortex-data/vortex? Also any plans to integrate with polars?

azimafroozeh · 1d ago

Vortex borrows a few ideas from the FastLanes project, such as bit-packing and ALP. However, it’s unclear how well these are implemented — their performance on ClickBench appears worse than Parquet in both storage size and decompression speed, which is counterintuitive.

Technically, Vortex is documented more like a BtrBlocks-style format, which we’ve benchmarked and compared against in depth.

_willmanning · 1d ago

I'll chime in (as a Vortex maintainer), that we are greatly indebted to Azim's work on FastLanes & ALP. Vortex heavily utilizes his work to get state-of-the-art performance.

I would add that Vortex doesn't have standalone Clickbench results. Azim is presumably referring to the duckdb-vortex results, which were run on an older version of duckdb (1.2) than the duckdb-parquet ones (1.3). We'll get those updated shortly; we just released a new version of Vortex & the duckdb extension. Meanwhile, I believe the DataFusion-Vortex vs DataFusion-Parquet speedups show substantial improvements across the board.

The folks over at TUM (who originally authored BtrBlocks) did a reasonable amount of micro-benchmarking of Vortex vs Parquet in their recent "Anyblox" paper for VLDB 2025: https://gienieczko.com/anyblox-paper

They essentially say in the paper that Vortex is much faster than the original BtrBlocks because it uses better encodings (specifically citing FastLanes & ALP).

I'm looking forward to seeing the FastLanes Clickbench results when they're ready, and Azim, we should work together to benchmark FastLanes against Vortex!

azimafroozeh · 23h ago

As we discuss in the FastLanes paper, the way BtrBlocks implements cascaded encodings (Vortex now) is essentially a return to block-based compression such as Zstd — which we're trying to avoid as much as possible. This design doesn't work well with modern vectorized execution engines or GPUs: the decompression granularity is too large to fit in CPU caches or GPU shared memory. So Vortex ends up being yet another Parquet-like file format, repeating the same mistakes. And if it still underperforms compared to Parquet... what’s the point?

We just released FastLanes v0.1, and more results — including ClickBench — are coming soon. Please do benchmark FastLanes — and keep us posted!

Mindjolt · 1d ago

Is there a place where we can contribute?

azimafroozeh · 23h ago

We’re building a community around FastLanes, and we’d love any contributions or feedback! Please join our Discord so we can discuss things in more detail.

timhigins · 1d ago

Any comparisons with Lance/LanceDB?

azimafroozeh · 1d ago

We haven’t benchmarked FastLanes directly against LanceDB yet, but here’s a quick look at the compression side:

LanceDB supports:

FSST

Bit-packing

Delta encoding

Opaque block codecs: GZIP, LZ4, Snappy, ZLIB

So in that regard, it’s quite similar to Parquet — a mix of lightweight codecs and general-purpose block compression.

FastLanes, on the other hand, introduces Expression Encoding — a unified compression model that allows combining lightweight encodings to achieve better compression ratios. It also integrates multiple research efforts from CWI into a single file format:

The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code (VLDB '23) PDF: https://dl.acm.org/doi/pdf/10.14778/3598581.3598587

ALP (Adaptive Lossless Floating-Point Compression) — SIGMOD '24 https://ir.cwi.nl/pub/33334/33334.pdf

G‑ALP (GPU-parallel variant of ALP) — DaMoN '25 https://azimafroozeh.org/assets/papers/g-alp.pdf

White-box Compression (self-describing, function-based) — CIDR '20 https://www.cidrdb.org/cidr2020/papers/p4-ghita-cidr20.pdf

CCC (Exploiting Column Correlations for Compression) — MSc Thesis '23 https://homepages.cwi.nl/~boncz/msc/2023-ThomasGlas.pdf

westonpace · 22h ago

Lance contributor here. This sounds about right. We haven't really innovated too much in the compression space. Most of our efforts have been around getting rid of row groups and the resulting changes in decoding patterns.

Our current approach is pretty similar to Parquet for scalar types. We allow a mix of general and lightweight codecs for small types and require lightweight only codecs for larger types (string, binary).

Nice work on the paper :)

k__ · 1d ago

Pretty cool!

WebAssembly planned?

azimafroozeh · 1d ago

CUDA and GPU support are next on our list — but we’re definitely interested in WebAssembly as well!

Vanilla JavaScript support for Tailwind Plus (tailwindcss.com)

It's a DE9, not a DB9 (but we know what you mean) (news.sparkfun.com)

Animated Cursors (tattoy.sh)

Why MIT switched from Scheme to Python (2009) (wisdomandwonder.com)

Experimental surgery performed by AI-driven surgical robot (arstechnica.com)

Never write your own date parsing library (zachleat.com)

CO2 Battery (energydome.com)

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm? (morethanmoore.substack.com)

Internet Archive is now a federal depository library (kqed.org)

Programming vehicles in games (wassimulator.com)

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope (wired.com)

How to Catch a Wily Poacher in a Sting: A Thermal Robotic Deer (wsj.com)

A Union Pacific-Norfolk Southern combination would redraw the railroad map (trains.com)

Developing our position on AI (recurse.com)

The future is not self-hosted (drewlyton.com)

Implementing a functional language with graph reduction (2021) (thma.github.io)

Why Is There a Date of 1968 in the Intel Chipset Device Software Utility? (intel.com)

Women dating safety app 'Tea' breached, users' IDs posted to 4chan (404media.co)

Show HN: Price Per Token – LLM API Pricing Data (pricepertoken.com)

Steve Jobs' cabinet (perfectdays23.substack.com)

Google in 1999: Search Engines Escape the Portal Matrix (cybercultural.com)

Who has the fastest F1 website (2021) (jakearchibald.com)

WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding (arxiv.org)

Trucking's uneasy relationship with new tech (bbc.com)

Windsurf employee #2: I was given a payout of only 1% what my shares where worth (twitter.com)

Show HN: Apple Health MCP Server (github.com)

Researchers value null results, but struggle to publish them (nature.com)

How to draw lambda diagrams (2020) (risingentropy.com)

Dwl: Dwm for Wayland (codeberg.org)

How to configure X11 in a simple way (eugene-andrienko.com)

Nullable but not null (efe.me)

The Tabs vs. Spaces war is over, and spaces have emerged victorious (xn--gckvb8fzb.com)

Monotonic and wall clock time in the Go time package (victoriametrics.com)

Celebrating 20 Years of MDN (developer.mozilla.org)

Quantitative AI progress needs accurate and transparent evaluation (mathstodon.xyz)

Tamiya chairman Shunsaku Tamiya dies at 90 (dailyexpress.com.my)

The sad state of font rendering on Linux (2018) (pandasauce.org)

Show HN: The Montana MiniComputer (mtmc.cs.montana.edu)

Quantum Scientists Have Built a New Math of Cryptography (quantamagazine.org)

Stackless Traversal (2018) (dyalog.com)

The Mythical Machine-Month Paradox – How much could AI change programming? (tucson-josh.com)

Games Look Bad: HDR and Tone Mapping (2017) (ventspace.wordpress.com)

Claude Code Introduces Specialized Sub-Agents (docs.anthropic.com)

Building Brain Box, a meta text adventure film adaptation (kubicki.org)

Google spoofed via DKIM replay attack: A technical breakdown (easydmarc.com)

Asciinema: Record and share your terminal sessions (asciinema.org)

Brazil central bank to launch Pix installment feature in September (reuters.com)

Google's shortened goo.gl links will stop working next month (theverge.com)

High-speed organic light-emitting diodes achieving 4-Gbps communication (spiedigitallibrary.org)

Lisp project of the day (40ants.com)

The FastLanes File Format [pdf]

Comments (21)