Show HN: Read Sort and Write Parquet Faster Than DuckDB in Java
1 pyckle 0 8/6/2025, 10:40:14 AM github.com ↗
I built a tool in Java that reads, sorts, and writes flat Parquet files faster than DuckDB, using a new Java Parquet driver called ParquetForge.
Performance is achieved through column-wise reordering and parallel algorithms. Sort order is computed using fastutil’s parallel radix or quicksorts, and columns are reordered in parallel as well. Parquet-Sort outperforms DuckDB when sorting a 59M-row Parquet file generated by TPC-H by about 25% when compiled by graalvm native-image and about 12% faster when run with the latest Corretto 24 JVM.
More details, benchmarks, and source code are available here: https://github.com/Earnix/parquet-sort All code is under the Apache 2.0 license.
Would love feedback, ideas, or contributions.
No comments yet