Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format
I would like to share a Rust implementation of the Zstandard seekable format I've been working on.
Regular zstd compressed files consist of a single frame, meaning you have to start decompression at the beginning. The seekable format splits compressed data into a series of independent frames, each compressed individually, so that decompression of a section in the middle of an archive only requires zstd to decompress at most a frame's worth of extra data, instead of the entire archive.
I started working with the seekable format because I wanted to resume downloads of big zstd compressed files that are decompressed and written to disk on the fly. At first I created and used bindings to the C functions that are available upstream[1], however, I stumbled over the first segfault rather quickly (it's now fixed) and found out that the functions only allow basic things. After looking closer at the upstream implementation, I noticed that is uses functions of the core API that are now deprecated and it doesn't allow access to low-level (de)compression contexts. To me it looks like a PoC/demo implementation that isn't maintained the same way as the zstd core API, probably that's also the reason it's in the contrib directory.
My use-case seemed to require a complete rewrite of the seekable format, so I decided to implement it from scratch in Rust using bindings to the advanced zstd compression API, available from zstd 1.4.0.
The result is a single dependency library crate[2], and a CLI crate[3] for the seekable format that feels similar to the regular zstd tool.
Any feedback is highly appreciated!
[1]: https://github.com/facebook/zstd/tree/dev/contrib/seekable_f... [2]: https://crates.io/crates/zeekstd [3]: https://github.com/rorosen/zeekstd/tree/main/cli
Has zstd actually standardized the seekable version? Last I checked (which was quite a while ago) it had not been declared a standard, so I was reluctant to write a filter for nbdkit, even though it's very much a requested feature.
Given existing libraries, it should be really simple to create an SQLite VFS for my Go driver that reads (not writes) compressed databases transparently, but tool support was kinda lacking.
Will the zstd CLI ever support it? https://github.com/facebook/zstd/issues/2121
Why zeek, BTW? Is it a play on "zstd" and "seek"? My employer is also the custodian of the zeek project (https://zeek.org), so I was confused for a second.
[1] https://github.com/SaveTheRbtz/zstd-seekable-format-go
Yes, the name is a combination of zstd and seek. Funnily enough, I wanted to name it just zeek first before I knew that it already exists, so I switched to zeekstd. You're not the first person asking me if there is any relation to zeek and I understand how that is misleading. In hindsight the name is a little unfortunate.
[1] https://www.htslib.org/doc/bgzip.html
The spec does mention:
> While only Checksum_Flag currently exists, there are 7 other bits in this field that can be used for future changes to the format, for example the addition of inline dictionaries.
so I don't think seekable zstd supports these dictionaries just yet.
With multiple inline dictionaries, one could detect when new chunks compress badly with the previous dictionary and train new ones on the fly. Could be useful for compressing formats with headers and mixed data (i.e. game files, which can contain a mix of text + audio + video, or just regular old .tar files I suppose).
Explanation here https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/
This is basically the only function I use from zstd_seekable, so it would be nice to have that in zeekstd as well.