Hyperpb: Faster dynamic Protobuf parsing

71 bhollis 17 7/23/2025, 5:32:37 PM buf.build ↗

Comments (17)

jsnell · 6h ago
See also the discussion on the technical description last week: https://news.ycombinator.com/item?id=44591605

(IMO much more interesting article than this announcement, and that probably should have gotten more attention than it did.)

dang · 4h ago
Thanks! That one was recent enough that I think we can re-up it. I'll put a link to this thread in there, so people can read both.
ManBeardPc · 6h ago
Interesting approach using a JIT compiler. It says compilation is slow, is there a way to persist the compiled code and load it later (for example for CLIs or faster redeployments)?
paulddraper · 5h ago
It's called AoT....
jayd16 · 4h ago
No, I think they want Profile-Guided Optimization. I think the C# AoT mode uses the results of a JIT first run.
the_duke · 5h ago
The delta to the performance of C++/Rust Protobuf implementations would be interesting.
nateb2022 · 5h ago
Even before Hyperpb, Go was already very competitive, e.g. this article from last year: https://www.greptime.com/blogs/2024-04-09-rust-protobuf-perf...
jeffbee · 5h ago
My experience is that the practical performance achievable with Go is higher because the C++ lifetime issues are too difficult to reason about and therefore the developer is forced to copy for safety. In Go you can fairly easily alias everything from the physical buffer into your parsed object. In the official C++ library, protobuf refuses to acknowledge even the possibility of aliasing. Even if you say that your string types are "view" there is an owned buffer inside the generated class into which your data is copied. This is exasperating because inside Google they have several different ways to not copy a string into a protobuf, and they're all patched out of the open source edition, and you can read them and cry about it by looking at their git logs for "internal change" commits with baffling only-whitespaces changes that are symptomatic of where they are patching out the good stuff.
reactordev · 4h ago
Oh it’s worse, it’s a full on marshal of the whole data. What we need is a no-allocation-protobuf that binds to existing memory, knows about aliases, can deal with a pointer. I love protobuf but I’ve moved to other messaging implementations that provide a faster marshal/unmarshal. Maybe I’ll give this a try.
beagle3 · 4h ago
Flatbuffers from Google is 11 years old and does that. (Protobufs is over 20 at this point).

https://stackoverflow.com/questions/25356551/whats-the-diffe...

reactordev · 4h ago
MessagePack is what I’m currently using, I needed a small binary format.
benreesman · 4h ago
It's not out-of-the-box compatible with everything in the way that `proto3` is, but if dealing with the really atrocious performance and ergonomics of protobuf in C++ (among other targets) is bad enough to warrant going slightly off the beaten path, flatbuffers is still pretty mainstream. It's got bindings for the big languages and it's used IIRC in a bunch of the FAANG mobile clients, stuff like that.

Going a little further afield, `capnp` is cool. It's got a much nicer IDL and object model, but you start to get into where non-C++ bindings are "community maintained" in a pretty loose sense. I'm not sure how much sense it makes unless it really lands on your polyglot stack perfectly, because if you only need C++, zpp_bits is really ergonomic and approaches theoretical limits on performance along a number of dimensions.

I don't love any of the answers here.

reactordev · 4h ago
I’m currently using MessagePack. It does the job of making small binary messages but I still suffer from marshal/unmarshal copying.

For certain messages with a fixed size (no strings or arrays) I can pin a message and reuse its memory address within the queue but there’s still data in memory that needs to be copied. At the very least from the TCP/IP stack.

haberman · 1h ago
I think you can alias the input data using Cord fields? As long as the input is Cord.
jeffbee · 41m ago
Almost, but there aren't repeated cords yet. At my company we maintain a patch that adds repeated cords, but it's a real chore because the project changes a lot of little internal details as needed.
mwigdahl · 6h ago
Really missed a great naming opportunity with "superpb" (pronounced as "superb").
JoshTriplett · 4h ago
I'd expect the current name to be pronounced like the first part of "hyperbole", which doesn't have nearly the same positive connotations, yeah.