Probably Faster Than You Can Count: Scalable Search with Probabilistic Technique

Comments (2)

jqpabc123 · 9h ago

The basic idea is simple, there is a trade-off between accuracy and performance.

This is not really new or that uncommon.

What the fastest way to sort a list of names?

The way I have been doing it for the past 25 years --- build a hash list of the first 3 alphabetical characters and sort this using integer comparison --- way faster than comparing strings.

The results are pretty good --- but obviously not perfect. But good enough so that with thousands of users every day for 25 years, I have never received a single complaint that the sorting is flawed.

lidorbt · 9h ago

Nice write up dude! Bloom filters are a classic for pruning shards, but it’s great to see concrete numbers on bit-array size and hash counts — those "back of the envelope" estimates really help ground the theory. I’m curious how you tune the false positive rate in environments where cardinality spikes (e.g. DDoS bursts) — do you resize filters on the fly or accept more “maybe” shards? The HyperLogLog++ section was crisp too, but I’d love to see a comparison to Count-Min Sketch for heavy-hitters queries (e.g. top IPs) and real-world benchmarks against Redis PFCOUNT or ClickHouse’s built-in HLL. Anyone here tried combining these probabilistic structures in the same pipeline?