Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights (jameshard.ing)

I recently used a bloom filter to achieve a log message anti-spam feature. In the logger I hashed the messages and inserted into the filter. If the entry was present I didn’t output the messages. Then every few seconds I would iterate over the filter and clear all the bits. It worked out nicely that I didn’t have to worry about atomically clearing all the bits in the filter, if messages were coming in and any of their bits had been cleared that was sufficient to cause them to be logged again. This was much more efficient than the previous implementation which kept a count of messages seen and would saturate at N and had the effect that if a particular message was being repeatedly logged it would be seen, just at most at the rate at which the filter was being cleared.

After being aware of bloom filters for a while it was quite satisfying to organically find a real use for one that was such an improvement.

alienbaby · 7h ago

This article is aimed squarely at people like me. I'd heard of them. I kept meaning to look them up everytime I saw them mentioned. I finally did when I saw your articale and it was the perfect intro that I was looking for :)

vayup · 25m ago

Same with me

kridsdale3 · 4h ago

I wrote a Bloom Filter for college in CUDA in 2009. My advisor was a former Nvidia guy. I then went on to not do any GPU programming at all in my career.

I probably could have made $100,000,000 if I had made a different choice there.

ricardo81 · 48m ago

Improbable considering it was a CS idea in 1970. Surely every idea for GPGPU was fair game.

I wrote a hashcash implementation on a GPU 10 years ago. Pretty sure it's valueless now.

Kranar · 1h ago

Could have also bought Bitcoin and made a lot more... just saying.

deryilz · 41m ago

It's always great to see interactive posts. I also appreciated the list of where Bloom filters were used in popular programs.

256bit · 6h ago

Another visualisation of Bloom filters can be found at the end of this page: https://www.chrislaux.com/hashtable.html

marginalia_nu · 7h ago

I have a trick I like:

For sets that are plausibly sometimes going to be small where you're going to do a lot of membership checks, you can speculatively add a 64 bit bloom filter with a trivial hash function.

This sounds really stupid, but the cost of doing this is so small you can do it as a gamble. If it doesn't work out you've added like 10ns to your insertions and membership checks, but when it does work out, you can save an incredible amount of work.

Sesse__ · 7h ago

Chromium does this in a bunch of places; the article only links to Safe Browsing using murmur, but the renderer (Blink) generally uses rapidhash and has some of these micro-filters which it uses for e.g.:

  - querySelector() in certain cases
  - Prefiltering hash lookups in CSS buckets
  - Rapid reject of elements when looking for certain Aria attributes (for accessibility)

It's surprising that such tiny filters (32 or 64 bits) work at all, but they often do. There are also some larger Bloom filters around.

(I added some of these)

marginalia_nu · 7h ago

They just have a really unintuitive economy where they basically only need to work once or twice to make up for the cost of all the times they don't contribute any benefit.

Sesse__ · 2h ago

For extra fun, you sometimes can make ideal filters with no false positives, if you know your possible elements ahead of time and you don't insert too many of them. (E.g., for 20 elements, you can construct a 12-bit code where there are guaranteed no false positives as long as you insert at most two elements.)

konsalexee · 6h ago

Another one bloom filter post I really appreciated from Eli Bendersky if anyone wants to read more: https://eli.thegreenplace.net/2025/bloom-filters/

verytrivial · 5h ago

The overlap in concepts required to understand Bloom filters, sets and hash tables is about 95% IMHO. A set is a hash table used for membership tests where you only care about the key, not the value. And a Bloom filter is just a set that exploits the fact that many-to-one hashing 'compresses' the key-space with collisions. It deliberately uses a very collide-y hash function. If a specific key was ever hashed, you WILL get a hit, but there might be other keys that produced the same hash. It's a feature, not a bug.

cortesoft · 28m ago

I think the main bit your explanation is missing is how a bloom filter uses multiple hash functions to reduce collisions. For example, a bloom filter might have 3 hashes, and all of them have to hit for a key to be known to be in the set. This reduces the likelihood of false positive collisions while keeping the no false negatives guarantee.

cherrycherry98 · 5h ago

Glad to know I'm not alone in my mental modeling of Bloom filters as just hash tables that only track the buckets which have data but not the actual data itself.

marginalia_nu · 5h ago

If you've grokked bloom filters, you're very close to also understanding both random projection and certain implementations of locality-sensitive hashes.

costco · 4h ago

I had used bloom filters in the past without really understanding how they worked. Then one day I decided to implement them just going off the Wikipedia article with the 32-bit MurmurHash function and was surprised at how simple it was. If you're using C++ you can use std::vector<bool> (or as of C++23, std::bitset) to make it even easier to store the bits in a space efficient way.

b0a04gl · 5h ago

i got into bloom filters while debugging cassandra read spikes ,lot of sstable lookups even when key not exist ,didnt make sense at first ,then realised bloom filter on each sstable meant to skip disk ,but default fp rate was high like 0.1 or so ,too much for our case ,most reads were cache miss anyway so those false positives were killing us ,changed it to 0.01 ,bit more memory it consumed but way less useless reads ,lbrought p99 read latency by good 16-18%

anon-3988 · 5h ago

I have a specific use case where I know from startup the list of words that I want to find and this will not change for the duration of the program. Can anyone think of a low latency solution to this? I have tried a lot of variations of bloom filter, perfect hash, linear lookup, binary search, set search etc

It appears that perfect hash is the one that works the best for my use case.

jerf · 4h ago

You're saying you can use a perfect hash also implies you know you will only find those values? If so, then yes, the name is accurate and is probably a very good choice.

But if you put things into the perfect hash function it is not expecting, some fraction of them will collide.

If you're searching for a fixed set, look at the Ragel library. Compile-time generation of the search in a way that is very hard to beat.

Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights (jameshard.ing)

Gemini CLI (blog.google)

IDF officers ordered to fire at unarmed crowds near Gaza food distribution sites (haaretz.com)

JavaScript Trademark Update (deno.com)

Writing toy software is a joy (blog.jsbarretto.com)

uv: An extremely fast Python package and project manager, written in Rust (github.com)

MCP: An (Accidentally) Universal Plugin System (worksonmymachine.substack.com)

More on Apple's Trust-Eroding 'F1 the Movie' Wallet Ad (daringfireball.net)

OpenAI charges by the minute, so speed up your audio (george.mand.is)

A new PNG spec (programmax.net)

Engineered Addictions (masonyarbrough.substack.com)

A new pyramid-like shape always lands the same side up (quantamagazine.org)

Fun with uv and PEP 723 (cottongeeks.com)

Man 'refused entry into US' as border control catch him with bald JD Vance meme (dublinlive.ie)

A new PNG spec (programmax.net)

Thnickels (thick-coins.net)

Define policy forbidding use of AI code generators (github.com)

I deleted my second brain (joanwestenberg.com)

-2000 Lines of code (2004) (folklore.org)

AlphaGenome: AI for better understanding the genome (deepmind.google)

Facebook is asking to use Meta AI on photos you haven’t yet shared (theverge.com)

What Problems to Solve (1966) (genius.cat-v.org)

Microsoft Edit (github.com)

Starship: A minimal, fast, and customizable prompt for any shell (starship.rs)

PlasticList – Plastic Levels in Foods (plasticlist.org)

Games run faster on SteamOS than Windows 11, Ars testing finds (arstechnica.com)

US Supreme Court limits federal judges' power to block Trump orders (theguardian.com)

Introducing Gemma 3n (developers.googleblog.com)

Finding a 27-year-old easter egg in the Power Mac G3 ROM (downtowndougbrown.com)

Alternative Layout System (alternativelayoutsystem.com)

XSLT – Native, zero-config build system for the Web (github.com)

U.S. Chemical Safety Board could be eliminated (ishn.com)

Puerto Rico's Solar Microgrids Beat Blackout (spectrum.ieee.org)

Ambient Garden (ambient.garden)

GitHub CEO: manual coding remains key despite AI boom (techinasia.com)

US economy shrank 0.5% in the first quarter, worse than earlier estimates (apnews.com)

I made my VM think it has a CPU fan (wbenny.github.io)

Basic Facts about GPUs (damek.github.io)

JWST reveals its first direct image discovery of an exoplanet (smithsonianmag.com)

Getting ready to issue IP address certificates (community.letsencrypt.org)

Build and Host AI-Powered Apps with Claude – No Deployment Needed (anthropic.com)

Launch HN: Issen (YC F24) – Personal AI language tutor

ChatGPT's enterprise success against Copilot fuels OpenAI/Microsoft rivalry (bloomberg.com)

National Archives at College Park, MD, will become a restricted federal facility (archives.gov)

The bitter lesson is coming for tokenization (lucalp.dev)

Libxml2's "no security embargoes" policy (lwn.net)

We ran a Unix-like OS on our home-built CPU with a home-built C compiler (2020) (fuel.edby.coffee)

Why is the Rust compiler so slow? (sharnoff.io)

Reading NFC Passport Chips in Linux (shkspr.mobi)

Ancient X11 scaling technology (flak.tedunangst.com)

Bloom Filters by Example

Comments (21)