How to Store Data on Paper?

21 mofosyne 7 5/31/2025, 7:20:15 AM monperrus.net ↗

Comments (7)

tocs3 · 2d ago
I have been thinking about this for a long time. Thanks for the link.

The biggest advantage of character-based encodings is that they can be decoded by humans (as opposed to dot-based encodings), which means that you don’t need a camera or a scanner to recover the data.

This is an interesting point. In our post apocalyptic future scholars will be using their quills to translate archives of these (in my imagination anyway). Of course they would have to translate into binary and then into human chars.

I can imaging they will be sad they cannot listen to the mp3's.

Adding color allows on to code more information per dot (3x more with three colors).

Is this right? Wouldn't it be base-3 encoding? Three bits of binary can count to 8. Three trits of base three can count to 27. Color has all sorts of disadvantages but maybe a much greater payoff (unless I m mistaken).

mackmgg · 9m ago
> Is this right? Wouldn't it be base-3 encoding? Three bits of binary can count to 8. Three trits of base three can count to 27. Color has all sorts of disadvantages but maybe a much greater payoff (unless I m mistaken).

In this case they're not directly using the color to store information, they just have three differently colored QR codes overlayed on top of each other. With that method you can use a filter to separate them back out and you've got three separate QR codes worth of data in one place. The way they're added ends up using more than just three colors in that example.

If you were truly to use colored dots to store binary information without worrying about using a standard like QR, I think you'd be going from base-2 (white and black) to base-3 (red, blue, green) or more likely base-4 (white, red, blue, green) or even base-8 (if you were willing to add multiple colors on top of each other) in which case yeah you'd have way more than just 3x the data density.

adzm · 1h ago
The inhernt errror resilience in charactre encoding of human languige is also an intersetnig point.
rickcarlino · 1h ago
I got curious about OCR as a sort of poor man’s microfiche. I printed a test paragraph on high quality paper with a laser printer. The smallest font I could read under a USB microscope was 2.5pt, though I could probably have gone smaller if I used polymer paper. The fibers of the paper are quite apparent under a microscope. Transparency film paper was too smudgy.
makeworld · 55m ago
I wonder if you could add error correction to get around OCR failures.
eimrine · 2d ago
Thank you for sharing! I would like to get deeper: how many Bytes is possible to write on a paper with this or that encoding, how about having some extra bits for the sake of data loss recovery, what are approaches to a multi-page storages and are there any patches for incremental archiving?

I will try to remove dust from my A4 scanner and try to read that MP3 from printed medium, seems a bit insane to store multimedia in a paper but who needs to store it without proven ability to read. My printers love to mess with ink (especially ones with pirate-refilled cartridge) so I do not really believe this is practically at maximum resolution.

fourthark · 23m ago
He covers error correction and information densities on the linked page:

https://www.monperrus.net/martin/perfect-ocr-digital-data

(Last section before conclusion.)

IIUC this provided the best overall reliable information density (at 4.2kb / A4 page).