Sail: a Rust-Based Spark Replacement

2 chenxi9649 1 7/8/2025, 7:16:12 PM lakesail.com ↗

Comments (1)

chenxi9649 · 3m ago
Hey HN! We're excited to share Sail 0.3, our open-source distributed computing framework that serves as a drop-in replacement for Apache Spark.

Sail is a Rust-native execution engine that speaks the Spark Connect protocol. Your existing Spark SQL and DataFrame code runs unchanged, but executes on average 4x faster while using 94% less infrastructure spend.

Here's how we are achieving that performance:

1. No JVM overhead: Rust's zero-cost abstractions and deterministic memory management eliminate GC pauses

2. Columnar processing: Apache Arrow format enables SIMD instructions to process multiple records per CPU cycle

3. Zero-copy data transfer: Python UDFs run in-process with shared memory buffers (no serialization)

4. Lightweight workers: Processes start in seconds

What's new in v0.3:

- This release is a major milestone - we now support both Spark 3.5 and 4.0, including the new lightweight pyspark-client. The framework automatically detects your Spark version and adjusts its behavior accordingly, so one Sail binary works across versions.

Why this matters:

- Spark revolutionized big data 15 years ago, but its JVM foundation struggles with modern workloads. As teams process more real-time data and AI workloads, they're hitting walls with latency, cloud costs, and operational complexity. Sail is trying to solve all of these problems while not requiring you to rewrite everything that you already did with Spark.

- We're working toward unifying batch, streaming, and AI workloads in a single framework. Imagine running your ETL, real-time analytics, and model training on the same infrastructure with predictable performance. The project is open source (Apache 2.0) and we'd love your feedback! We have a growing community on Slack where early adopters are already running production workloads.

GitHub: https://github.com/lakehq/sail

Docs: https://lakesail.com

Our internal benchmarks(for the 4x and 94% number): https://docs.lakesail.com/sail/latest/introduction/benchmark...

Slack: https://lakesail.com/slack

Happy to answer any questions about the architecture, benchmarks, or migration path!