Show HN: GPU market is absurd! So I built a dashboard of pricing/restock trends (gpuisfine.singhkays.com)

1. It creates a bitmap where each bit is a pixel in the image, if from frame 0 to frame 1 a given pixel changed, the corresponding bit is 1, otherwise it is 0.

2. All the 1s are added to the bloom filter, hashing their offsets. Now the bloom filter will be positive for all such indexes plus a percentage of false positive indexes.

3. We query the bloom filter to see all the indexes that are positive, and for all such pixels we store the raw pixel data of what changed. So we can reconstruct the next frame easily.

You can think at this like as storing the delta between two frames as: x,y,r,g,b of all the pixels that changed, but compressing a lot the x,y part at the cost of storing a bit more r,g,b than needed.

I have the feeling that since the pixels that changes from frame 0 to frame 1 are often similar (in their location) to what will change from frame 1 to frame 2, there is the possibility of further compressing that as well, by setting the right flags in the next frame and storing verbatim the only offsets that changed in addition to the previous or alike.

90s_dev · 32d ago

This comment is why I go to the comments first.

Oh hey you're the guy who made kilo. Good job.

[edit] lol he edited it... they always edit it

antirez · 32d ago

I always love when people recognize me for kilo or dump1090 or hping and not for Redis :D Side projects for the win. Thanks for your comment!

90s_dev · 32d ago

I've literally never even used Redis, let alone know what it is or does. I dunno how I was able to make money in software since 2008 without figuring that out... or learning SQL, or C++. There's far more that I don't know than I do know. But anyway if you wrote Redis or something then congrats, I've definitely heard of it.

antirez · 32d ago

I have a theory, that I call "of the equivalence of modern software systems" that tells a lot about how unimportant Redis and other technologies are, that is: modern computing is so developed that pick any random language, kernel, and database, any of the top ones available, and I can create every project without too much troubles. PHP / Win32 / SQLite? Ok, I can make it work. Ruby / Linux / Redis? Well, fine as well.

taneq · 32d ago

All modern stacks are CRUD-complete as well as Turing-complete? ;)

90s_dev · 32d ago

I've noticed that too, LAMP stack vs MEAN stack etc.

Part of it seems to be that software languages have "flavors" like natural spoken language does, so that one seems natural to one person and foreign to another, much like how DHH took TypeScript out of Rails, and I can't read Rails code or live without static types.

Also college kids are always learning what the last generation already knew (like Lisp) and reinvent everything in it with the vigor of their youth and it gets popular, which I imagine is how Rails and Redis both started and caught on.

But sometimes there are genuine innovations, and we can't build on them until someone comes up with them, much like how we couldn't invent machines until someone figured out that we could just use the lever to make gears to transport energy, which Archimedes didn't think of. And the more we learn collectively, the more these things spread.

I wonder if this means it'll all stabilize into a JavaScript-like toolchain one day. Maybe Gary Bernhardt was right all along.

simondotau · 32d ago

I still use CFML / MySQL (more specifically Lucee / MariaDB). Aside from not being one of the cool kids, I haven’t seen anything sufficiently compelling to justify changing my stack.

catmacey · 31d ago

Heh. Me too! Throw in Coldbox and its ecosystem and you have most of what you need.

tehjoker · 32d ago

redis is designed for scaling so if you don’t have a large project you don’t need it

tie_ · 32d ago

Ain't it cute 'splaining what redis is designed for to the person who designed it

butterfi · 31d ago

One of my CS professors told the story about when MCSFT demonstrated their implementation of the Korn shell at a USENIX conference. Gasbag on stage tells the guy in the question line that he (questioner) has misinterpreted something in the software's original intention. Gasbag figures out something is up when half the room cracked up. The guy at the mic was David Korn, author of the Korn shell.

tehjoker · 32d ago

I thought I was replying to a subcommenter... oopsie

90s_dev · 32d ago

You win the "someone on HN made me laugh out loud" award. The last person to win it was like a month ago so good job.

qingcharles · 32d ago

I read it in Ralph Wiggum's voice

pmarreck · 31d ago

Honestly, the relatively high probability of this happening is part of what makes HN great :)

joaohaas · 32d ago

Redis is not designed for scaling. The 'default' version is a single-core app without any focus on availability.

Yes there's Redis Cluster, but that's a separate thing, and most Redis installations don't use it.

secondcoming · 32d ago

Redis is absolutely not designed for scaling. Maybe valkey is but I've yet to use it

very-old-sw · 30d ago

Software engineers have known for decades that the choice of implementation language is one of the smaller costs in a large project. Also, I personally never use a language that is less than 20 years old. Life cycle costs:

post-release fixes: 50%.

pre-release testing and debugging: 25%.

requirements, design, implementation, etc: 25%

cryptonector · 32d ago

A.k.a. all you need is PG and something to serve your app over HTTPS. :joy:

messe · 32d ago

I think PG might insist on using a lisp too.

metadat · 32d ago

Does Postgres support Lisp?

cryptonector · 32d ago

I think u/messe was punning. I used "PG" to mean PostgreSQL, and u/messe probably knew that and probably meant PG as in Paul Graham, who is a huge fan of Lisp. I thought it was funny.

nix-zarathustra · 32d ago

No, but it has a stutter.

nssnsjsjsjs · 30d ago

All you need to build a great app server is PG, certbot and a great app server.

braaaahp · 32d ago

Yeh. Data driven model syncing machines will be what kills languages, software stacks.

All the chip world talk of single function machines and how tech is now energy constrained industry, means pruning the state we store; most software is software to deliver software.

Single function pipeline hardware platforms will act as little more than sync engines from models.

I mean it’s is 2025. Still write software like it’s the 1970s. So high tech.

Edit: https://arxiv.org/abs/2309.10668

From LLMs to energy models that transform electromagnetic geometry. Saving what’s needed to recreate cat pics and emails to mom. Pruning the tail as interest dips with generational churn.

90s_dev · 32d ago

Also think about it this way:

Redis will eventually become obsolete. It may take 10 or 50 years, but it will happen.

But kilo taught many developers about C, editors, terminals, and how simple it is to do syntax highlighting. This epiphany inspired me and many others, and the things we make because of that will last much longer than Redis.

mattbis · 32d ago

Redis does a lot of things most people don't know about and its all super optimised.. I am not so sure. I would not want that happen simple as I would be really bored using one way to do everything ( prb sql )

Most people use it as a cache,.. you can build a lot of things with it that are far outside this narrow view... ( say a bidding system or something like that )

And for the other comment Scaling is possible via the commercial offering.. sofaik..

I didn't get to persuade anyone to let me use it for that.. I really wanted to.

rurban · 32d ago

Super optimized? Veto. It could use much better modern thread-safe hashmaps. With at least twice the performance.

But it has a fantastic ease of use

cozzyd · 32d ago

dump1090 (but not Redis) user here! Hi! (We're about to publish a paper that depends on data taken with dump1090, funny enough...).

antirez · 32d ago

Awesome! Thanks! I had in mind of doing a V2 of dump1090 soon (time permitting) I have a few ideas that may improve the performances very significantly.

echoangle · 32d ago

Is there any plan to have a JSON/JSONlines TCP endpoint in the V2? That’s not currently there, right? I want to process data in real time but then I have to decode the raw messages on port 30002 myself.

antirez · 32d ago

That's a good idea. Adding to the TODO list.

echoangle · 32d ago

Awesome, thanks!

Findecanor · 32d ago

I wonder how good compression rations this really has...

It reminds me of experiments I did with wavelets for image compression some.. 22 years ago.

The inverse transform starts with a small pixel image and then uses just as many coefficients to transform it to an image twice as large (in width or height). This is done over and over again.

The point is that most of your data consists of these coefficients, and most of them are close to zero: so you can flush them to zero. Then the problem is mostly a matter of how to encode where you don't have zeroes: i.e. you have like a bitmap and an array of the non-zeroes. There were different algorithms that encoded non-zeroes, more or less conservatively, but they did often exploit the property that these non-zeroes typically were quite clustered — which is the opposite of a typical hash function used for a bloom filter.

Image compression this way had obviously very poor locality, first just for the transform and then for the coefficient compression, so it was quite slow. Therefore it did feel like a dead end.

3cats-in-a-coat · 32d ago

Thing is, if you store delta change from one frame to another, then pixels which aren't changed are just zeroes. Compressing zero sequences is the most trivial exercise for lossless compression and unlike the bloom filter, it has no false positives.

I can see bloom filters as part of a complicated hybrid strategy compressor. The more tools on the belt of such a compressor, the better, but I don't think it'll improve things much on average.

wrsh07 · 32d ago

The comparison is: for your use case, which is better? Decompress and matrix sum or grab keys and increment individual values?

There's some flexibility for how one would do that second increment, but of course naively it might look like constructing a sparse matrix and summing so that feels pretty similar. But the flexibility might be nice in some cases, and bloom filters are an extremely space efficient representation with arbitrarily low false positive rate

bilsbie · 32d ago

So how does the bloom filter help vs something like a hash table?

hxtk · 32d ago

It sounds like your instinct here is that we need to store (x,y,r,g,b) tuples for everything that changed, and leave other pixels unchanged from the previous frame.

You could store all of that in a single map from (x,y) to (r,g,b), which would cost you, say, 4 bytes per pixel for the (x,y) data and then, say, 30 bits or (rounding up) 4 bytes per pixel for the (r,g,b) with 10-bit color. A hash map would optimize lookup, but for efficient storage you really just want an array of 8-byte chunks where the first two bytes are x, next 2 are y, next 10 bits are r, then g, then b. To decode, you just start with the previous frame and then iterate down the list and apply each diff in the list.

If you iterate over (x,y) in a consistent order, you avoid the need to store it because you can simply say that the next (r,g,b) tuple in your list is the pixel data for the next (x,y) value that passes the bloom filter. This effectively cuts the amount of data you have to store per pixel in half, but increases the number of pixels you have to store by a probabilistically small amount because bloom filters have false-positives.

johanvts · 32d ago

You start out with the position of all the 1s (positions of changed pixels). A hashtable would be great for speeding up the query time, but does not compress anything. The bloom filter takes up much less space. The price is that you can get false positives.

returningfory2 · 32d ago

You don't need to store the diffs for the coordinates that are not in the bloom filter. If the number of coordinates that changed is small, the size of the bloom filter will be small, and this is a significant optimization.

raincole · 32d ago

So it's just compressing consecutive 0s? Like most of compression algorithms do...?

grumbelbart2 · 32d ago

It's more consistent.

As a rule of thumb, Bloom filters require 10 bits per element for a 1% false positive rate. Say 100 pixels changed between the frames, that 1000 bits or ~125 bytes to store the which-pixel-changed-map.

Runlength encoding of the (in)active bits can use anything from ~10-200 bytes for 100 pixels (Say 1 byte per run, 200 runs of active vs. inactive in the worst case).

hinkley · 32d ago

A lot of video compression is about motion. How do you handle the same pixels sliding two pixels to the left due to a pan?

01HNNWZ0MV43FF · 32d ago

You could definitely layer block-level motion prediction in before this step, though

djmips · 32d ago

It doesn't as far as I can tell.

macleginn · 32d ago

How do you go from frame n+1 to frame n+2?

Salgat · 32d ago

Similar to other compression methods, it uses keyframes (which are saved frames that exist throughout the video) as the starting point, after which the next frame is a diff of that until you reach the next keyframe. It seems to generate keyframes when the compression drops below a certain level (at which point you might as well just store the compressed frame instead of the bloom bitmap).

https://github.com/ross39/new_bloom_filter_repo/blob/4798d90...

meatmanek · 32d ago

I suspect this works better because the input videos (Youtube videos) have already been compressed and decompressed.

With raw video input, I think the assumption "most pixels change little (or not at all) between consecutive frames, creating a sparse difference matrix ideal for this approach." would break down. For a very clean signal (low-noise sensor, brightly lit scene), maybe it'd work, but most real-world signals will have noise > 1 LSB, so I'd expect the lower bits to be changing at least half the time.

Sending the video through a compression and decompression cycle first will tend to remove that noise, creating an artificially static video where that assumption holds up.

hxtk · 32d ago

From what I can tell, it also isn't lossless: https://github.com/ross39/new_bloom_filter_repo/blob/main/vi...

This looks like it is not storing the diff for any pixel that had the mean of its r,g,b values change by less than 10, which means that if a pixel goes from pure blue (#00ff00) to pure red (#ff0000) in consecutive frames, it would decode as pure blue in both frames.

arcastroe · 32d ago

It'd be better to convert to HSL, and compute the distance from there.

HSL-distance preserves color similarity with higher fidelity than converting both pixels to greyscale as done in the shared link

oivey · 32d ago

That doesn’t address the issue at all. The issue is that discarding small changes in any color space is fundamentally lossy, so the algorithm isn’t lossless.

arcastroe · 31d ago

Well yes.. I agree.

But my comment was addressing parent's point that pure blue and pure red are currently considered close together since they share the same greyscale values.

Sesse__ · 32d ago

You mean something like L*ab, right, not HSL? HSL is a pretty terrible approximation to human vision.

nasso_dev · 32d ago

Just like you wouldn't use PNG for photography, I don't think you'd use a lossless video codec for real-world footage.

Lossless video would make much more sense for digital content like screen recordings, where the assumption that few pixels change between consecutive frames makes much more sense.

shrinks99 · 32d ago

Lossless formats (specifically DPX & EXR sequences, usually with ZIP compression) are used frequently for real world footage in post-production workflows! Granted, most consumers don't run into this stuff. Photographers don't use PNG because it doesn't offer anything for them that camera raw or EXR files don't already cover.

einsteinx2 · 32d ago

As far as video types go maybe it would work well for animation since that tends to compress well for the same reason a screen recording does.

kookamamie · 32d ago

Sure you do. E.g. ffv1 and huffyuv are used for archiving video losslessly.

nulld3v · 31d ago

HEVC/H.265 and VP9 can both also operate in lossless mode. In my testing they achieve compression ratios that ffv1 and huffyuv cannot come close to touching (not all that surprising given how old ffv1 and huffyuv are).

Ffv1 is much easier to work with compared to HEVC and VP9 though, probably thanks to it's deep integration with ffmpeg. By comparison, working with HEVC was hell on earth!

I spent like a week figuring out how to stuff JPEG sequences into HEVC files without losing colorspace and color profile information. Even JPEGs alone were horrible to work with, my understanding is they encode image data in the YUV444P colorspace, but every program works with them in the RGB colorspace. The conversion between YUV444P <-> RGB is lossy and every program does the conversion slightly differently. So if I ever accidentally triggered this conversion at any point, my archival was no longer bit-perfect.

And then to check your results, you obviously can't convert back to JPEGs, I think I ended up encoding to TIFF after having failed with JXL.

Code here if anybody else wants to try replicating it (good luck, you'll need it): https://gist.github.com/null-dev/ebd2f8b23c3e5066a48976c7308...

Converting lossy JPEGs to lossless HEVC might seem wasteful, but the space saved by intra-frame compression dwarfs the space saved by lossy encoding.

themerone · 31d ago

Losssy compression is standard on high end cinema cameras.

In most cases the compression ratio will be low enough that the video will be perceptually lossless.

MBCook · 32d ago

Normal people never use raw, so that might not big a big issue. Phones and cameras store files in MP4 or AV1 or whatever anyway.

Unless you know to turn it on and deal with the files sizes and processing people may not realize concept of raw/unprocessed exists anymore.

I’d never thought about that before.

sionisrecur · 32d ago

So as it is, it would work great for animation.

jiggawatts · 32d ago

The lazy approach is to download an 8K video and downsample it to something like 720p.

Or just buy a camera and run around capturing some raw 8K footage of everyday scenes.

abeppu · 31d ago

> I'd expect the lower bits to be changing at least half the time.

I didn't read the code, but the readme focuses on whether less than 32.4% of bits are 1. So even if a lot of the pixels are changing at the lower bits between each frame, so long as enough of the upper bits aren't changing, it should still in principle work?

meindnoch · 32d ago

So, according to your graph [1] this new compression is always strictly worse than just using GZIP?

[1] https://github.com/ross39/new_bloom_filter_repo/blob/main/co...

Retr0id · 32d ago

It's absent from this graph, but I'd imagine the bloom filter approach could be at least be faster than gzip. But I don't see perf metrics anywhere else...

croemer · 32d ago

Why would it be faster? You have to do all that hashing and lookup of witness data etc.

Also, if you want fast, good compression, use zstandard not gzip.

Retr0id · 32d ago

gzip (and zstandard) decompression is inherently serial, whereas many bloom filter lookups can be as parallel as you like.

hxtk · 32d ago

The bloom filter lookups here are fundamentally not parallel because they are coupled to a serial iteration.

Ultimately the data they are storing is tuples of (x,y,r,g,b), which means that it is not sufficient to ask "does (x1,y1) pass the bloom filter?" which would be embarrassingly parallel; you have tobe able to match it to an index in your list of (r,g,b) data.

The way this is done is to iterate over all (x,y) in a consistent order for encoding and decoding, and for each (x1,y1), if it passes the bloom filter, then the next (r1,g1,b1) tuple is the pixel data for that pixel.

thesz · 32d ago

You may perform n parallel lookups into Bloom filter then use state machine table to perform k (most often k < n) actions, oftentimes in parallel (SIMD) too because it is just byte shuffle.

hxtk · 32d ago

Hmm, that's a good point, since the bloom filter check is CPU-harder than the action taken on lookup, you could just do all the lookups in parallel and then do all of the copies separately.

alexjurkiewicz · 32d ago

You can de/compress multiple streams in parallel, at minimal cost of overall efficiency. For gzip, look at pigz. For video, look at how x264/x265 slice up the video frame.

croemer · 32d ago

Parallelism isn't speed. Zstandard decompression is insanely fast, no need for occupying multiple cores.

Retr0id · 32d ago

Bloom filter lookups are embarrassingly parallel to the point that you could occupy 0 CPU cores and do it in a GPU fragment shader.

There are probably other reasons why this is a bad idea but I'm curious to try it.

RedNifre · 32d ago

Maybe you could check all pixels in parallel?

bob1029 · 32d ago

> The key insight: when a binary string has a low density of 1s (specifically below p* ≈ 0.32453), we can encode just the positions of those 1s more efficiently than storing the raw string.

Much of what JPEG/MPEG are doing is rearranging the problem such that it is possible to create long runs of zeroes. The way in which a DCT block is scanned relative to the location of its AC/DC components is potentially one of the most innovative aspects of many video & image compression techniques.

akoboldfrying · 32d ago

Agree.

OP's approach is actually terrible for video compression, because it actively throws away the locality of pixel changes present in typical video.

A nicer way of putting it would be that nothing about OP's technique is specific to video frames -- the same idea could be used to compress the diff between any two equal-length bit sequences. Might this nevertheless turn out to be better than existing forms of compression for this problem, like gzipping the concatenated pair of blocks? No, because the only way you get compression at all is if the distribution of inputs (here, sets of different bit positions) is highly predictable, i.e., nonrandom -- and pushing data through a hash function actively destroys that (especially if it's a cryptographically strong hash, the whole point of which is to produce output that is indistinguishable from random).

cogman10 · 32d ago

I don't believe this is correct.

What the DCT does along with the color representation transformation is to turn fine details into higher frequencies and core details into low frequencies. From there, the quality of the image and thus compression ratio is as simple as dropping high frequency representations.

And besides that, jpegs use a Huffman table to further reduce the size of the image.

AFAIK, it doesn't do anything special to reduce runs. So lining up zeros really doesn't help much.

IshKebab · 32d ago

This is true, but OP was also correct. The DCT components are quantised and encoded in an order such that you get a long string of 0s at the end (the high frequencies).

Retr0id · 32d ago

Dropping (or rather, heavily quantizing) the high frequency components does create runs of zeroes with high probability. The order the components are stored (a diagonal zig-zag) pushes the likely-to-be-zero elements together. At higher quality settings you might not have actual runs of zeroes, but you'll at the least have runs of low-entropy values.

brigade · 32d ago

Dropping the high frequencies does create zero runs, and even JPEG encodes zero runs as a run-length (RRRR in the spec)

But DCT isn't very useful for lossless since any lossless frequency domain representation requires more range than the source, and you can't quantize to counter it.

Sesse__ · 32d ago

JPEG even has a special code from “the rest from here is all zeros, so we stop this block now”. The entire format is pretty much organized around trying to get as many zero coefficients as possible.

hxtk · 32d ago

I'm confused by this line here: https://github.com/ross39/new_bloom_filter_repo/blob/4798d90...

It seems like this would make the compression lossy and discard, e.g., transitions from #ffffff to #fffffa, and the line above it where the mean of pixel data is taken would discard, e.g., transitions from #ff0000 to #00ff00 regardless of the threshold used.

Am I misunderstanding the role of that line of code? It doesn't seem like anything that is a 0 in the resulting mask would be encoded into the bloom filter.

clayhacks · 32d ago

I see you put how to calculate the compression ratio, but do you have some examples of worst case, average, and best case compression ratios?

Edit: ok I see the photos in the repo. Putting them in the README would be helpful

rh3939 · 32d ago

Author here. The repo is a complete mess but I do have some some code in there to generate graphs and whatnot if you're willing to dig through the code. I will make this much more concrete with lots of proper testing. Its very much still a messy work in progress.

codetrotter · 32d ago

I applaud you for uploading even if it’s a bit disorganised still, and I do the same. It’s better to have something than nothing. Sometimes I see people talking about something but they don’t want to upload their code yet because they have to clean it up first. And then either they don’t get around to ever cleaning it up the way they wanted, or by they time they do it will have fallen off everyone’s radar and be forgotten. At least with a messy repo it’s possible to poke around, and not the least to star it and maybe check back later to see if they cleaned it up.

rh3939 · 31d ago

Hi HN, author here:

I received some great feedback so I have decided to focus on raw video for now and more rigorous testing around noisy videos. I will continue to update the repo frequently. Its only very early days but the raw video testing has yielded some decent results(with some caveats).

- Compression ratio of 4.8% (95.2% size reduction) - Compression speed: 8.29 frames per second - Decompression speed: 9.16 frames per second - Only 4% of frames needed as keyframes - Perceptually lossless output (PSNR: 31.10 dB)

Comparison with standard codecs: - Rational Bloom Filter: 4.8% - JPEG2000 (Lossless): 3.7% - FFV1 (Lossless): 36.5% - H.265/HEVC: 9.2% (lossy) - H.264: 0.3% (lossy)

### Current Limitations and Future Work While the compression results are promising, the system is not yet truly lossless in its handling of color channels. The current implementation faces challenges with the YUV to BGR color space conversion process:

1. *Color Space Conversion Precision*: Small rounding errors occur during conversion between YUV and BGR color spaces, leading to minor differences in pixel values (average difference of ~4.7).

2. *Channel Processing*: The current implementation processes color channels in BGR format after conversion, which introduces additional precision loss.

The plan moving forward is to: - Implement direct YUV processing without conversion to BGR - Develop a true bit-exact handling of color data - Refine the Bloom filter parameters specifically for chroma subsampling patterns - Create a dedicated verification system for each color channel independently

I want to be able to prove its mathematically lossless but I am quite a ways from this. I plan to pursue this lossless compression idea and I have a few other ideas in different domains around using the rational bloom filter.

Dwedit · 32d ago

It's also possible to run codecs like H.264 in a true lossless mode, it's just almost never done.

perching_aix · 32d ago

Yup, even got it to work with hardware acceleration via NVENC. Playback was tough though, ffplay would work it, but nothing else.

vintermann · 32d ago

This is a cute concept, but if you have a sparse binary string, you can probably do better with traditional methods!

croemer · 32d ago

Indeed, as this comparison with gzip shows: https://github.com/ross39/new_bloom_filter_repo/blob/main/co...

oivey · 32d ago

It’s difficult to trace through the repo, but it looks like the compression ratios are computed by seeing how many pixel diffs were able to dropped. That’s interesting, but the more interesting comparison would be to the average byte size of each frame in the compressed YouTube video. Without this, it’s hard to tell if the algorithm is an improvement relative to current methods.

If the algorithm is lossy (small diffs are squeezed to zero) it probably should be compared to other lossy algorithms rather than lossless.

chungy · 32d ago

I'm confused by the README. It makes references to YouTube video, but also "lossless video".

Is this about recompressing existing H.264[1] videos (like downloaded from YouTube) losslessly, or is it about making new videos from a source losslessly? The former reminds me of JPEG XL's ability to re-compress the DCT on old JPEGs, but even as you get better compression, you still don't have a lossless image.

[1] To be fair, H.264 can be lossless in the first place, but YouTube does not serve up such files.

rh3939 · 32d ago

Author here. Totally agree that H.264 can be lossless. Generally its lossy. My idea(which I am still working out) is to compress the difference of frames using a rational bloom filter. I previously posted here about using conditional bloom filter that rely on rational k. The idea was to use different values of k based on whether the url was more likely to be malicious than not. This results in a lower fp rate for the same filter size when compared to integer k. I then saw this paper[https://arxiv.org/html/2502.02193v2] was posted recently which describes an almost identical approach(theirs is much nicer). I will do much more rigorous testing as my current setup is a bit sloppy but I hope it illustrates the idea.

wging · 32d ago

So it sounds like the use of rational Bloom filters here is just to get a better compression ratio, but the basic technique could be used with classic Bloom filters—is that right? Do you know how much you gain in space savings from using rational Bloom filters? It’s not obvious to me how much of a gain it would be.

pipo234 · 32d ago

I think you might be able to compress I and P frames somewhat descent, this way. But you only seem to address spatial domain, except for deltas(?) Or do you have some way to apply bloom filters to motion estimation as well?

magicalhippo · 32d ago

The introduction seems quite clear on it being an alternative to H.264 and friends:

Traditional video codecs like H.264 and H.265 achieve impressive compression by discarding "imperceptible" visual information. But what if we could guarantee perfect reconstruction while still achieving meaningful compression? This project explores an unconventional approach: repurposing Bloom filters—typically used for membership testing—as a lossless video compression mechanism.

Further down comes the explanation for why this scheme might possibly work:

Rather than compressing whole video frames, this system applies Bloom filter compression to frame differences. This capitalizes on temporal coherence—most pixels change little (or not at all) between consecutive frames, creating a sparse difference matrix ideal for this approach.

Of course, using delta-frame compression has been a common theme of many video codecs for ages now, and many like H.264 and H.265 use additional techniques like motion estimation[1] to reduce the information in the delta frame further before final entropy coding step[2][3].

As such, the best is probably to view this as an alternative to the entropy encoding in H.264 or similar.

[1]: https://en.wikipedia.org/wiki/Motion_estimation#Video_coding

[2]: https://en.wikipedia.org/wiki/Context-adaptive_variable-leng...

[3]: https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding#E...

perching_aix · 32d ago

I think it's safe to assume that the author is reencoding those YouTube samples there, so vp9/avc/av1 -> uncompressed -> compressed with this, and that the compression ratio is with respect to the uncompressed stream. Otherwise I think the README would sound quite a bit more enthusiastic :)

runeblaze · 32d ago

Yeah probably if they were writing a paper, encoding `.raw` files would have served them better (for being more convincing)

less_less · 32d ago

If you want to use Bloom filters for compression, you might want to consider binary fuse filters, ribbon filters or similar which avoid the 1/ln(2) leading factor in space usage.

akoboldfrying · 32d ago

I'm struggling to see how this could give decent compression performance.

Compression works by exploiting patterns or tendencies in the input. One of the biggest and simplest patterns in video data is that pixel changes tend to be localised: E.g., a person waving their arm against a static background changes a clump of pixels in a small region only. By hashing positions, you immediately throw away this locality information -- under the proposed scheme, changing 400 pixels confined to a 20x20 rectangle requires ~exactly as much space as changing 400 pixels randomly scattered around the image, while a more efficient representation could record the top-left coords, width and height of the box, and save about 398 copies of pixel coordinates. And yes, this benefit remains even if we don't record the coords explicitly but rather discover them by querying a Bloom filter.

And that's to say nothing of more advanced video compression techniques like motion detection (make the rectangle here look like the rectangle there from the previous frame), which is an absolute staple of every practical codec.

joaohaas · 32d ago

I think you meant video decompression? Compression should theoretically be a lot more efficient, since the algorithm is a lot simpler and you can probably do a lot of SIMD to calculate the filters themselves.

akoboldfrying · 31d ago

In compression, "efficiency" usually means "compression level".

If by "efficiency" you mean "speed", then yes, I think OP's approach can be much, much faster than usual video compression approaches -- but this is true of all compression algorithms that don't compress very well. The fastest such algorithm -- leaving the input unchanged -- is infinitely fast, but achieves no compression at all.

mxfh · 32d ago

I have trouble following the motivation here.

Isn't the idea of lossless pretty meaningless for all consumer grade purposes, especially if the input example was already mangled through YouTube's transcoders?

Besides possibly going from some MPEG1 or older to .h264/like lossless transcoding, I see no benefit in lossless methods here.

From my personal experience, I tried archiving some of my old DVDs where no better sources are available for purchase for the forseeable future by transcoding to even .265 at absurdly high bitrates, but they all looked worse at higher bitrates than simply re-containerized MPEG2 for media-server streaming.

All you do is transcode the output of an existing compressing wasting information with conserving artifacts from a prior reductive step.

4:4:4 LOG or something would be a target to benchmark here.

Even "lossy" Apple ProRes start out at 275 Mbit/s for ProRes 4444 at HD25p already.

https://en.wikipedia.org/wiki/Apple_ProRes#Data_rates

perching_aix · 32d ago

I think you're in a misunderstanding indeed.

This is not about lossless being practical now with this, or about reeconding YouTube videos with this providing any practical utility, or anything. It's just about using bloom filters for compression. The motivation was just the technical interest in bloom filters. They say as much in the readme:

> This project explores an unconventional approach: repurposing Bloom filters—typically used for membership testing—as a lossless video compression mechanism.

The video source is practically irrelevant as long as it's an actual video.

mxfh · 32d ago

I just don't get why it's reinventing the whole wheel here – try using an off-the-shelf codec and tack on a sparse correction. For example, encode frames with a modern lossless/near-lossless codec (AV1 with QP=0) and then append a tiny bitmask+delta residual for perfect reconstruction. These codecs already exploit motion compensation, intra-prediction and DCT-like transforms to minimize frame-to-frame deltas.

In practice you’d likely get better compression (and faster progress) by piggybacking on AV1 strengths – then use Bloom-filter trick just on the leftover sparse differences – rather than building a new codec from scratch.

The residuals can be expected to be quite noisy and will likely not profit from any intra frame predictability anymore.

JPEG XL’s lossless engine already improves on PNG by ~35%, that and other general purpose compression methods would then be the per frame benchmark here on the residuals to beat.

In short: use the proven gear (motion-compensated blocks + modern transforms) to do the heavy lifitng, then let the bloom filter chase the hopefully comparably small residual.

As a showcase of what bloom filters are this would be still worthwhile, but I don't see any practical benefit here yet.

Not to forget, there is a reason visually lossless is the de facto the norm now, even in production grade environments, storage space is still not free while the average uncompressed display stream easily reaches way north of 5Gbps now easily, there is only so much lossless can resonably do here.

perching_aix · 32d ago

Yes, they could do a lot of other things, but those other things would not be this. I think your expectations are a bit misplaced. Maybe try giving all this a read again a day later?

joaohaas · 32d ago

As mentioned, OP is not expecting people to use the compression algo on production stuff. You can think of it as an experiment to see how well bloom filters would apply to video compression.

Are the results impressive? Probably not, but it's still an interesting idea.

jitl · 32d ago

Of course every round lossy of encoding further discards data. RemoveData(RemoveData(source)) is always going to look worse than just RemoveData(source). Newer encoders manage to remove less visual data per byte of storage used but there’s no way re-encoding is going to ever look better.

perching_aix · 32d ago

If I understand it right, some lossy codecs can be implemented in an idempotent (and still standard-compliant) way, so there would be no generational loss (in select specific cases, e.g. matched settings). I'm also aware that e.g. JPEG-XL can reencode JPEGs without generational loss, while still improving compression efficiency a bit. But I never looked too deep into the math.

rowanG077 · 32d ago

lossless isn't meaningless. Re-encoding introduces a lot of artifacts. Imagine your comment in 2001 when someone stored their movies in mjpeg or whatever. The moved to MP4, than the h264 than perhaps to HEVC. You realize how shit that movie would look after all those re-encode cycles?

mxfh · 32d ago

That's exactly what im talking about.

Re-Containerizing MPEG-TS as-is to something like mkv, vs. Transcoding is exactly what I'm talking about here.

There are currently not any meaningful ways known to me, to even make MPEG2 files significantly smaller in way more modern and advanced codecs without loosing perceived quality even at the same bitrate.

Not even talking about interlacing issues here.

So for anything MPEG2 and newer lossless reencoding seems quite a futile excersise to me from my personal experience.

If there is a promising way I'm all here for it, but this sadly doesnt look like it.

ggm · 32d ago

Is it computationally less expensive?

Is it amenable to stream processing e.g. in an FPGA?

Is the compression ratio acceptable in a modern data stream?

Is it actually a sub-class of compression already known, and therefore probably encumbered?

ambicapter · 32d ago

> The Innovation: Rational Bloom Filters

Is this project about the video compression or about an innovation on Bloom filters?

rh3939 · 31d ago

without the rational bloom filter, no compression can take place. The new idea is rational bloom filters and this is just an example of how a rational bloom filter can be used to compress video. Its a novel approach to video compression. I posted an update on the top of the thread.

iqandjoke · 32d ago

License type is needed. Something like GPLv3/ WTFPL Do What The Fuck You Want To Public License?

Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights (jameshard.ing)

Show HN: Hijacking Wi-Fi Direct for Seamless Wi-Fi Onboarding on Raspberry Pi (gist.github.com)

Show HN: I made a 2D game engine in Dart (bullseye2d.org)

Show HN: Pobshell – A Bash‑like shell for live Python objects (github.com)

Show HN: Do you know RGB? (maxwellito.github.io)

Show HN: Sink – Sync any directory with any device on your local network (github.com)

Show HN: Zenta – Mindfulness for Terminal Users (github.com)

Show HN: Reactylon – Open-source framework for building 3D/XR apps with React (github.com)

Show HN: A self-hosting compiler from WHILE to WebAssembly (github.com)

Show HN: Magnitude – Open-source AI browser automation framework (github.com)

Show HN: I open-sourced a $1M engine for closing loops in embedding space (github.com)

Show HN: I built an AI dataset generator (github.com)

Show HN: FOHO – Safer rentals for foreigners in Korea with escrow and AI (foreignerhome.com)

Show HN: PILF, The ultimate solution to catastrophic oblivion on AI models (github.com)

Show HN: AIOps MCP – Log anomaly detection using Isolation Forest (github.com)

Show HN: Dungeon Master in Your Console (github.com)

Show HN: I built Hispi, an app to design custom jewellery (hispi.app)

Show HN: Self-host your data anonymization pipeline (github.com)

Show HN: Wayland Speech-to-Text Tool (github.com)

Show HN: PRSS Site Creator – Create Blogs and Websites from Your Desktop (prss.co)

Show HN: IssuePay – Get paid for open-source contributions (issuepay.app)

Show HN: Oasis – An open-source, 3D-printed smart terrarium (github.com)

Show HN: Scream to Unlock – Blocks social media until you scream “I'm a loser”

Show HN: AI-SDK-Cpp – Unified C++ SDK for OpenAI, Anthropic, and More (github.com)

Show HN: We launched an AI builders podcast (builtthisweek.com)

Show HN: Thockfactory – An Online Configurator for Custom Keycaps Enthusiasts (thockfactory.com)

Show HN: Elelem, a tool-calling CLI for Ollama and DeepSeek in C (codeberg.org)

Show HN: Split Vim Markdown Preview – Terminal-Based Markdown Preview for Vim (github.com)

Show HN: GPU market is absurd! So I built a dashboard of pricing/restock trends (gpuisfine.singhkays.com)

Show HN: I built a JSON-RPC library for Zig with compile time reflection (github.com)

Show HN: A Beautiful Python GUI Framework (github.com)

Show HN: I made a tool to convert recipes to an easy reading graphical notation (gobsmacked.io)

Show HN: Comparator - I built a free, open-source app to compare job offers (comparator-one.vercel.app)

Show HN: Anytype – a local and collaborative database with API and MCP server (zhanna.any.org)

Show HN: I'm 15 and built Gofer, an AI that gets actual terminal work done (github.com)

Show HN: Weather Watching (walzr.com)

Show HN: Autumn – Open-source infra over Stripe (github.com)

Show HN: Built a Food Scanner for Longevity (getbiohack.app)

Show HN: Report idling vehicles in NYC (and get a cut of the fines) with AI (apps.apple.com)

Show HN: VSCan - Detect Malicious VSCode Extensions (vscan.dev)

Show HN: I wrote a GPU-less billion-vector DB for molecule search (live demo) (cheese-new.deepmedchem.com)

Show HN: A Ruby gem that generates a secure, unique ID for the current machine (github.com)

Show HN: Piper-mode – Text-to-speech for Emacs using the Piper TTS engine (github.com)

Show HN: I built an under-the-door fan duct to lower bedroom CO2 (lepekhin.com)

Show HN: An accurate implementation of Pomodoro Technique (flowkeeper.org)

Show HN: Daf·thunk – open-source Editor for Prototyping Workflows on Cloudflare (dafthunk.com)

Show HN: I built a cloud on my own ASN w real 1:1 compute to fight the cartels (infuze.cloud)

Show HN: Lego Island Playable in the Browser (isle.pizza)

Show HN: Free tool to resize ChatGPT images to perfect 16:9 (aithumbnail.so)

Show HN: Frametwo – Tired of my videos getting flagged, so I built this (frametwo.com)

Lossless video compression using Bloom filters

Comments (116)