Bzip2 crate switches from C to 100% Rust

142 Bogdanp 55 6/17/2025, 8:06:54 PM trifectatech.org ↗

Comments (55)

dralley · 3h ago
How realistic is it for the Trifecta Tech implementation to start displacing the "official" implementation used by linux distros, which hasn't seen an upstream release since 2019?

Fedora recently swapped the original Adler zlib implementation with zlib-ng, so that sort of thing isn't impossible. You just need to provide a C ABI compatible with the original one.

wmf · 2h ago
Ubuntu is using Rust sudo so it's definitely possible.
masfuerte · 3h ago
They do provide a compatible C ABI. Someone "just" needs to do the work to make it happen.
tiffanyh · 1h ago
I think that is the goal of uutils.

https://uutils.github.io/

rlpb · 3h ago
> You just need to provide a C ABI compatible with the original one.

How does this interact with dynamic linking? Doesn't the current Rust toolchain mandate static linking?

alxhill · 8m ago
The commenters below are confusing two things - Rust binaries can be dynamically linked, but because Rust doesn’t have a stable ABI you can’t do this across compiler versions the way you would with C. So in practice, everything is statically linked.
bluGill · 43m ago
Rust cannot dynamic link to rust. It can dynamic link to C and be dynamicly linked by C - if you combine the two you can cheat but it is still C that you are dealing with not rust even if rust is on both sides.

No comments yet

conradev · 34m ago
Rust importing Rust must be statically linked, yes. You can statically link Rust into a dynamic library that other libraries link to, though!
arcticbull · 2h ago
Rust lets you generate dynamic C-linkage libraries.

Use crate-type=["cdylib"]

nicoburns · 2h ago
Dynamic linking works fine if you target the C ABI.
timeon · 1h ago
You can use dynamic linking in Rust with C ABI. Which means going through `unsafe` keyword - also known as 'trust me bro'. Static linking directly to Rust source means it is checked by compiler so there is no need for unsafe.
rwaksmunski · 2h ago
I use this crate to process 100s of TB of Common Crawl data, I appreciate the speedups.
viraptor · 1h ago
What's the reason for using bz2 here? Wouldn't it be faster to do a one off conversion to zstd? It beats bzip2 in every metric at higher compression levels as far as I know.
rwaksmunski · 1h ago
Common Crawl delivers the data as bz2. Indeed I store intermediate data in zstd with ZFS.
declan_roberts · 1h ago
That assumes you're processing the data more than once.

No comments yet

malux85 · 1h ago
Yeah came here to say a 14% speed up in compression is pretty good!
koakuma-chan · 50m ago
It's blazingly fast
a-dub · 40m ago
i'd be curious if they're using the same llvm codegen (with the same optimization) backend for the c and rust versions. if so, where the speedups are coming from?

(ie, is it some kind of rust auto-simd thing, did they use the opportunity to hand optimize other parts or is it making use of newer optimized libraries, or... other)

firesteelrain · 2h ago
Anyone know if this will by default resolve the 11 outstanding CVEs?

Ironically there is one CVE reported in the bzip2 crate

[1] https://app.opencve.io/cve/?product=bzip2&vendor=bzip2_proje...

tialaramex · 2h ago
There's certainly a contrast between the "Oops a huge file causes a runtime failure" reported for that crate and a bunch of "Oops we have bounds misses" in C. I wonder how hard anybody worked on trying to exploit the bounds misses to get code execution. It may or may not be impossible to achieve that escalation.
Philpax · 2h ago
> The bzip2 crate before 0.4.4

They're releasing 0.6.0 today :>

solarized · 1h ago
Do they use any llm to transpile the C to Rust ?
Twirrim · 39m ago
If you're going to use tools to transpile, don't use something that hallucinates. You want it to be precise.

https://github.com/immunant/c2rust reportedly works pretty well. Blog post from a few years ago of them transpiling quake3 to rust: https://immunant.com/blog/2020/01/quake3/. The rust produced ain't pretty, but you can then start cleaning it up and making it more "rusty"

dataking · 59s ago
They indeed used c2rust for the initial transpile according to https://trifectatech.org/blog/translating-bzip2-with-c2rust/
nightfly · 1h ago
Task that requires precision and potentially hard to audit? Exactly where I'd use an LLM /s
CGamesPlay · 29m ago
Without commenting on whether an LLM is the right approach, I don't think this task is particularly hard to audit. There is almost assuredly a huge test suite for bzip2 archives; fuzzing file formats is very easy; and you can restrict / audit the use of unsafe by the translator.
anonnon · 2h ago
> Improved performance

After the uutils debacle, does anyone still trust these "rewrote in Rust" promotional benchmarks without independent verification?

vlovich123 · 2h ago
> After the uutils debacle

Which debacle?

anonnon · 2h ago
See https://desuarchive.org/g/thread/104831348/#q104831479 https://desuarchive.org/g/thread/104831348/#104831809

Also uutils is a corporate-sponsored, corporate-friendly MIT licensed rewrite that's hostile to user (and developer) freedom.

EDIT: for those unaware of the context, that thread was not long after a uutils dev gave a talk at FOSDEM where he presented benchmarks purporting to show uutils sort's superior performance, which /g/ exposed as being only due to its inadequate locale support.

vlovich123 · 1h ago
So what I’m getting is

1. The uutils project didn’t also make all locales cases for sort faster even though the majority of people will be using UTF-8, C or POSIX where it is indeed faster

2. There’s a lot of debating about different test cases which is a never ending quibble with sorting routines (go look at some of the cutting edge sort algorithm development).

This complaint is hyperfocusing on 1 of the many utilities they claim they’re faster on and quibbling about what to me are important but ultimately minor critiques. I really don’t see the debacle.

As for the license, that’s more your opinion. Rust as a language generally has dual licensed their code as MIT and Apache2 and most open source projects follow this tradition. I don’t see the conspiracy that you do. And just so I’m clear, the corporation your criticizing here as the amorphous evil entity funding this is Ubuntu right?

0cf8612b2e1e · 2h ago
So what was I supposed to get from that 4chan wannabe site? That the project is not currently at fast as GNU? Where is the lying?
anonnon · 1h ago
> 4chan wannabe

It's a 4chan archive (and one of its most robust), and the archived thread was on /g/ last March.

> That the project is not currently at fast as GNU? Where is the lying?

Watch the FOSDEM presentation at 15 minutes in: https://fosdem.org/2025/schedule/event/fosdem-2025-6196-rewr...

The presenter uses uutils sort (on Shakespeare's corpus) to show how much faster it is than coreutils, and /g/ found out it was only faster because it had no locale awareness, which is especially dishonest because the presenter claims drop-in, 1-to-1 compatibility as an explicit goal of the project, so this discrepancy between the two at least should have been acknowledged by him.

jeffbee · 2h ago
You should of course verify these results in your scenario. However, I somewhat doubt that the person exists who cares greatly about performance, and is still willing to consider bzip2. There isn't a point anywhere in the design space where bzip2 beats zstd. You can get smaller outputs from zstd in 1/20th the time for many common inputs, or you can spend the same amount of time and get a significantly smaller output, and zstd decompression is again 20-50x faster depending. So the speed of your bzip2 implementation hardly seems worth arguing over.
dale_huevo · 2h ago
A lot of this "rewrite X in Rust" stuff feels like burning your own house down so you can rebuild and paint it a different color.

Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.

cornstalks · 1h ago
> Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.

That's the kind of attitude that leads to 50% of modern CPU resources being allocated toward UI eye candy.

0cf8612b2e1e · 2h ago
Every cycle saved is longer battery life. Someone paid the one time cost of porting it, and now we can enjoy better performance forever.
dale_huevo · 2h ago
They kicked off the article saying that no one uses bzip2 anymore. A million cycles saved for something no one uses (according to them) is still 0% battery life saved.

If modern CPUs are so power efficient and have so many spare cycles to allocate to e.g. eye candy no one asked for, then no one is counting and the comparison is irrelevant.

yuriks · 2h ago
It sounds like the main motivation for the conversion was to simplify builds and reduce the chance of security issues. Old parts of protocols that no one pays much attention to anymore does seem to be a common place where those pop up. The performance gain looks more like just a nice side effect of the rewrite, I imagine they were at most targeting performance parity.
spartanatreyu · 1h ago
Exactly, even if we can't remove "that one dependency" (https://xkcd.com/2347/), we can reinforce everything that uses it.
jimktrains2 · 2h ago
Isn't bzip used quite a bit, especially for tar files?
Philpax · 2h ago
The Wikipedia data dumps [0] are multistream bz2. This makes them relatively easy to partially ingest, and I'm happy to be able to remove the C dependency from the Rust code I have that deals with said dumps.

[0]: https://meta.wikimedia.org/wiki/Data_dump_torrents#English_W...

jeffbee · 2h ago
If so, only by misguided users. Why would anyone choose bz2 in 2025?
0x457 · 2h ago
To unpack an archive made from the time when bz2 was used?
ben-schaaf · 1h ago
Of course no one uses systems, tools and files created before 2025!
jeffbee · 1h ago
bzip2 hasn't been the best at anything in at least 20 years.
appreciatorBus · 23m ago
The same could be said of many things that, nonetheless, are still used by many, and will continue to be used by many for decades to come. A thing does not need to be best to justify someone wanting to make it a bit better.
Twirrim · 35m ago
So? If I need to consume a resource compressed using bz2, I'm not just going to sit around and wait for them to use zstd. I'm going to break out bz2. If I can use a modern rewrite that's faster, I'll take every advantage I can get.
tcfhgj · 1h ago
> Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.

Attitude which leads to electron apps replacing native ones, and I hate it. I am not buying better cpus and more ram just to have it wasted like this

Rucadi · 2h ago
I personally find a lot more relevant the part about "Enabling cross-compilation ", which in my opinion is important and a win.

The same about exported symbols and being able to compile to wasm easily.

Terr_ · 2h ago
It seems to me like binary file format parsing (and construction) is probably a good place for using languages that aren't as prone to buffer-overflows and the like. Especially if it's for a common format and the code might be used in all sorts of security-contexts.
viraptor · 1h ago
Those cycles translate directly to $ saved in a few places. Mostly in places far away from having any UI at all.
anonnon · 2h ago
> Counting CPU cycles

And that's assuming they aren't lying about the counting: https://desuarchive.org/g/thread/104831348/#q104831479

DaSHacka · 1h ago
Rust devs continuing to use misleading benchmarks? I, for one, am absolutely shocked. Flabbergasted, even.
jxjnskkzxxhx · 2h ago
> lot of this "rewrite X in Rust" stuff feels like

Indeed. You know the react-angular-vue nevermind is churn? It appears that the trend of people pushing stuff because it benefit their careers is coming to the low level world.

I for one still find it mistifying that Linus torvals let this people into the kernel. Linus, who famous banned c++ from the kernel not because of c++ in itself, but to ban c++ programmer culture.