Show HN: Meow – An Image File Format I made because PNGs and JPEGs suck for AI
Instead of storing metadata alongside the image where it can be lost, MEOW ENCODES it directly inside the image pixels using LSB steganography - hiding data in the least significant bits where your eyes can't tell the difference, this also doesn't increase the image size significantly. So if you use any form of lossless compression, it stays.
What I noticed was, Most "innovative" image file formats died because of lack of adoption, but MEOW is completely CROSS COMPATIBLE WITH PNGs You can quite literally rename a .MEOW file to a .PNG and open it in a normal image viewer.
Here's what gets baked right into every pixel:
- Edge Detection Maps - pre-computed boundaries so AI doesn't waste time figuring out where objects start and end.
- Texture Analysis Data - surface patterns, roughness, material properties already mapped out.
- Complexity Scores - tells AI models how much processing power different regions need.
- Attention Weight Maps - highlights where models should focus their compute (like faces, text, important objects)
- Object Relationship Data - spatial connections between detected elements.
- Future Proofing Space - reserved bits for whatever AI wants to add (or comments for training LORAs or labelling)
Of course, all of these are editable and configurable while surviving compression, sharing, even screenshot-and-repost cycles :p
When you convert ANY image format to .meow, it automatically generates most AI-specific features and data from what it sees in the image, which makes it work way better.
Would love thoughts, suggestions or ideas you all have for it :)
Your extra data is a big JSON blob. Okay, fine.
File formats dating back to Targa (https://en.wikipedia.org/wiki/Truevision_TGA) support arbitrary text blobs if you're weird enough.
PNG itself has both EXIF data and a more general text chunk mechanism (both compressed and uncompressed, https://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.An... , section 4.2.3, you probably want iTXt chunks).
exiftool will already let you do all of this, by the way. There's no reason to summon non-standard file format into the world (especially when you're just making a weird version of PNG that won't survive resizing or quantization properly).
> exiftool -config exiftool.config -overwrite_original -z '-_custom1<=meta.json' cat.png
and
> exiftool -config exiftool.config -G1 -Tag_custom1 cat.png
You can (with AI help no less) figure out what `exiftool.config` should look like. `meta.json` is just your JSON from github.
Now go draw the rest of the owl. :)
Yes, it is a big JSON blob atm, haha and t's definitely still a POC, but the idea is to avoid having a separate JSON file that adds to the complexity. While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.
I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.
There's definitely a ton of work left to do, but I see a lot of potential in something like this (also, nice username)
That's why I mentioned that you put anything, include binary data--which includes images--into the chunks in a PNG. I think Pillow even supports this (there are some PRs, like https://github.com/python-pillow/Pillow/pull/4292 , that suggest this).
Your problem domain is:
* Have something that looks like a PNG...
* ...that doesn't need supporting files outside itself...
* ...that can also store textual data (e.g., that JSON blob of bounding boxes and whatnot)...
* ...and can also store image data (e.g., attention maps and saliency regions).
What I'm telling you is that the PNG file format already supports all of this stuff, you just need to be smart enough to read the spec and apply the affordances it gives you.
> I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.
In the 90s, we'd already spent vast sums of gold and blood and tears solving the "holy shit, how do we encode multiple things in images so that they can survive an image pipeline, be extensible to end users, and be compressed reliably."
None of this has been new for three decades. Nothing you are going to do is going to be a value add over correctly using the file format you already have.
I promise that you aren't going to see anything particularly new or exciting in this AI goldrush that isn't an isomorphism of something much smarter, much better-paid people solved back when image formats were still a novel problem domain (again, in the 1990s).
Why not exactly? ComfyUI encodes an absolute bonker amount of information (all arbitrary JSON) into workflow PNG files without any issues.
It feels like creating a custom format with backwards PNG compatibility and using steganography to cram metadata inside is an inefficient and over-engineered alternative to a .tar.gz with "image.png" and "metadata.json"
I'm working on redundancy and error correction to make it better!
So, "perfect Show HN"? ¯\_(ツ)_/¯
When images get loose in the wild, this can be very helpful.
Moreover, models and techniques get better over time, so these stored precomputed features are guaranteed to become obsolete. Even if they're there and it's simple to use in a pipeline and everybody is using this file format, pipelines still won't use it when they were precomputed years ago and state-of-the-art techniques give more accurate features.
- This is currently solved by inference pipelines. - Models and techniques improve over time.
The ability for different agents with different specialties to add additional content while being able to take advantage of existing context is what makes the pipeline work.
Storing that content in the format could allow us to continue to refine the information we get from the image over time. Each tool that touches the image can add new context or improve existing context and the image becomes more and more useful over time.
I like the idea.
also, the idea is to integrate the conversion processes/ pipelines with other data that'll help with customized workflows.
This is literally the problem solved by chunk-based file formats. "How do we use multiple authoring tools without stepping on each other" is a very old and solved problem.
that's not a bug, that's a (security) feature
If you're a proud generative AI user, and you don't want anyone deceived by your images, you want the this-was-created-by-ai metadata retained.
Step 1: Create .meow images of vegetables, with "per-pixel metadata" instead encoded to represent human faces. Step 2: Get your images included in the data set of a generative image model. Step 3: Laugh uproariously as every image of a person has vaguely-to-profoundly vegetal features.
For example, Draw.io can embed original diagrams in .svg and .png files, and the pre-suffix is .drawio.png or .drawio.png .
That makes the data much more fragile than metadata fields, though? Any kind of image alteration or re-encoding (which almost all sites do to ensure better compression — discord, imgur, et al) is going to trash the metadata or make it utterly useless.
I'll be honest, I don't see the need for synthesizing a "new image format" because "these formats are ancient (1995 and 1992) - it's about time we get an upgrade" and "metadata [...] gets stripped way too easily" when the replacement you are advocating not only is the exact same format as a PNG but the metadata embedding scheme is much more fragile in terms of metadata being stripped randomly when uploaded somewhere. This seems very bizarre to me and ill-thought-out.
Anyway, if you want a "new image format" because "the old ones were developed 30 years ago", there's a plethora of new image formats to choose from, that all support custom metadata. including: webp, jpeg 2000, HEIF, jpeg xl, farbfeld (the one the suckless guys made).
I'll be honest... this is one of the most irritating parts of the new AI trend. Everyone is an "ideas guy" when they start programming, it's fine and normal to come up with "new ideas" that "nobody else has ever thought of" when you're a green-eared beginner and utterly inexperienced. The irritating part is what happens after the ideas phase.
What used to happen was you'd talk about this cool idea in IRC and people would either help you make it, or they would explain why it wasn't necessarily a great idea, and either way you would learn something in the process. When I was 12 and new to programming, I had the "genius idea" that if we could only "reverse the hash algorithm output to it's input data" we would have the ultimate compression format... anyone with an inch of knowledge will smirk at this preposition! And so I learned from experts on why this was impossible, and not believing them, I did my own research, and learned some more :)
Nowadays, an AI will just run with whatever you say — "why yes if it were possible to reverse a hash algorithm to its input we would have the ultimate compression format", and then if you bully it further, it will even write (utterly useless!) code for you to do that, and no real learning is had in the process because there's nobody there to step in and explain why this is a bad idea. The AI will absolutely hype you up, and if it doesn't you learn to go to an AI that does. And now within a day or two you can go from having a useless idea, to advertising that useless idea to other people, and soon I imagine you'll be able to go from advertising that useless idea to other people, to manufacturing it IRL, and at no point are you learning or growing as a person or as a programmer. But you are wasting your own time and everyone else's time in the process (whereas before, no time was wasted because you would learn something before you invested a lot of time and effort, rather than after).
Overall, I think this is a positive ”problem” to have :-)
And lo and behold, in each case I did find that it was either not novel at all or it had some major downside I had initially missed.
Still, fun to think about new ways of doing things, so I still go at it.
I think you just illustrated how difficult it is to propose a new standard. Webp was not supported by many image related softwares (including the Adobe suite!) for years and earned a bad reputation, HEIF is also poorly supported, JPEG XL was removed from Chrome despite being developed by Google and not supported by any other browser AFAIK. Never heard of farbfeld before.
If the backing from Apple and Google was not enough to drive the adoption of an image format, I fail to see how this thing can go anywhere.
At inference time, you don't control the inputs, so this is moot. At training time, you've already got lots of other metadata that you need to store and preserve that almost certainly won't fit in steganographically encoded format, and you've often got to manipulate the image before feeding it into your training pipeline. Most pipelines don't simply take arbitrary images (nor do you want them: plenty of images need to be modified to, for instance, remove letterboxing).
The other consideration is that steganography is actively introducing artifacts to your assets. If you're training on these images, you'll quickly find that your image generation model, for instance, cannot generate pure black. If you're adding what's effectively visual noise to every image you train on, the model will generate images with noise.
Without an entirely dedicated editor or postprocessing plugin, stenography gets destroyed on modification.
Labeling and metadata a separate concerns. "Edge detection maps" etc are implementation details of whatever you are doing with image data, and quite likely to be non-portable.
And non-removability / steganography of additional metadata is not a selling point at all?
So my thoughts are, this violates separation of concerns and seems badly thought-out.
It also mangles labeling, metadata and technicalities, and attempts to predict future requirements.
I don't understand potential utility.
Though I have one question: once 2 bits/channel are used with Meow-specific data thus leaving 6bits/channel, I doubt how it can still retain perfect image quality when either: (if everything's re-encoded) dynamic range is reduced by 75% or LSB changes introduce noise to the original image. Not too much noise, but still.
[0] https://github.com/VoxleOne/XLabel
Edited for clarity
But it requires a ton of redundancy and error correction, perhaps enough to survive a few rounds of not-too-lossy reencoding. I dunno how much bandwidth is available before it damages the image.
The real limitation would be bandwidth.
This is one of the first lines of the readme. But this is PNG with some metadata encoded using the most naive steganographic technique (just thrown into the LSB of pixels -- no redundancy, no error correction, no compensation for down sampling, etc). Even ignoring everything else, this is just ... Nonsensical.
I am very very pro-AI. But this is slop.
Maybe your format could have some use, but I don't find your motivation convincing.
> Maybe your format could have some use, but I don't find your motivation convincing.
That's respectable and it very much still is a POC, I hope to keep working on it to be actually great :)
Look how many confused comments there are due to the page claiming features you don't have, don't understand, and don't make sense on their own terms (what's an "attention map"? with maximum charity, if we had some sort of attention-as-in-LLM-like structure precached, how would it apply beyond one model? how big would the image be? is it possible to fit that in the 2 bits we claim to fit in every 4 bytes)
I don't want for you to take it personally, at all, but I never, ever, want to see something like this on the front page again.
You've reinvented EXIF and JPEG metadata, in the voice of a diligent teenager desiring to create something meaningful, but with 0 understanding of the computing layers, 4 hours with Wikipedia, and 0 intellectual humility - though, with youth, born not from obstinance, but naiveté.
Some warning signs you should have taken heed of:
- Metadata is universally desirable, yet, somehow unexplored until now?
- Your setup instructions use UNIX commands up until they require running a Windows batch file
- The example of hiding data hides it in 2 bits in a channel then "demonstrates" this is visually lossless because its hidden in 1 bit across 2 channels (it isn't, because if it was, how would we determine which 2 of the channels?) ("visually lossless" confuses "lossless", a technical term meaning no information was lost with a weaker claim of being lossy-but-not-detectably-so)
I'll leave it here, I think you have the idea and there's a difference between being firm and honest, and being cruel, and length will dictate a lot of that to a casual observer.
>- Your setup instructions use UNIX commands up until they require running a Windows batch file
Is your comment AI generated? The only setup instructions prior to the windows commands are "git", "cd", and "pip". "cd" exists on both windows and unix. The other commands might not be available by default on windows, but they're not exactly "UNIX" commands either. The other code blocks mostly seem to be assuming windows (eg. "start" or "copy" command), so I don't see any contradictions here.
Are you asking this earnestly, or, is it meant to communicate something else? If so, what? :)
Genuinely, the most interesting part of the comment to me, in that it is does not have 0 meaning, and rings of some form of frustration, yet the rest of your comment stays focused on technical knowledge, and AFAIK you are not the author (who I'd expect would be at least temporarily angry at my contribution)
If you're going to accuse some else of technical inconsistencies, maybe you should make sure your critiques are free of technical inconsistencies as well. You know, "people who live in glass houses shouldn't throw stones" and all that.
Note we both agree on that, you seem to assume I claimed something else, like, cd doesn't exist on windows.
Let's say I instead said "this doesn't work on Windows"
I spent probably...8 hours? on Windows this week doing dev, and I'm about 70% sure all of those commands will work on Windows, with dev mode switched on, with WSL on, prereqs installed...
Let's steelman this to the max: any possible prerequisite that could block it, doesn't mean its actually blocked. Dev mode on, WSL, prerequisites wrestled with and installed, can download source and edit then compile, but can only patch build errors, not add new functionality.
Are you 100% sure those commands will work?
(separately, you misunderstand the quote re: glass houses. It would apply if I had used AI to write not-even-wrong claims and then submitted to HN. This misunderstanding leads to a conclusion that it is impermissible to comment on the correctness of anything if you may be incorrect, which we can both recognize leads to absurdities that would lead to 0 communication ever.)
>Note we both agree on that, you seem to assume I claimed something else, like, cd doesn't exist on windows.
No, you made a specific claim of "Your setup instructions use UNIX commands up until they require running a Windows batch file", when those "UNIX commands" were "pip" and "python". That statement is incorrect because those commands are readily available on windows.
Your remark about "you seem to assume I claimed something else, like, cd doesn't exist on windows" is absurd at best and verges on bad faith that I'm not even going to engage with it.
>I spent probably...8 hours? on Windows this week doing dev, and I'm about 70% sure all of those commands will work on Windows, with dev mode switched on, with WSL on, prereqs installed...
Which commands are those? The only non-native windows commands I see are git, pip, and python, the latter of which are both included in python. You're making it sound like you need to jump through a bunch of hoops to get those commands working, when really all you have to do is run the installers for git and python.
>Are you 100% sure those commands will work?
Again, my claim isn't that the project works 100%, or even that it's not AI generated, it's that your critique makes little sense either.
>(separately, you misunderstand the quote re: glass houses. It would apply if I had used AI to write not-even-wrong claims and then submitted to HN. This misunderstanding leads to a conclusion that it is impermissible to comment on the correctness of anything if you may be incorrect, which we can both recognize leads to absurdities that would lead to 0 communication ever.)
No, the reason why I accused you of AI generated comments and made the remark about glass houses is that claiming "pip" and "python" are "UNIX commands" is so absurdly wrong that it's on the level of the OP. I agree that you don't have to be 100% correct to accuse people of posting dumb stuff, but you shouldn't be posting dumb stuff either.
You seem very upset, at least, I'm not used to people being this aggressive on HN, and I've been here for 15 years. I apologize for my contribution to that, if not my sole responsibility for it.
I remain fascinated by your process, I never have heard bad faith invoked when someone points at their actual words.
Generally, it is rare someone invokes "bad faith" when someone else's thoughts don't match their expectations.
I just...can't lie to you. I can't claim I thought it wouldn't work on Windows. I thought the opposite! That the sequence had 0% chance of working on not-Windows, and a 70% chance of working on Windows.
>> Are you 100% sure those commands will work? > Again, my claim isn't that the project works 100%, or even that it's not AI generated,
Oh! I'm referring to the commands, not the project :) The project can output "APRIL FOOLS!", as far as I care for this exercise.
> it's that your critique makes little sense either.
Oh, interesting - happy to hear more beyond that I must have meant pip/Python aren't available on Windows. If that's your sole issue, well, more power to you :) I do want to avoid lying to you just to avoid an aggressive conversation, you may not be even meaning to be aggressive. With the principle of "don't lie", I can't say I had something else in my head that matches your understanding so far, I presume something like "They are UNIX commands follows by Windows commands" [and thus this won't work on Windows]
> claiming "pip" and "python" are "UNIX commands"
Do you think I thought pip/Python wasn't on Windows? Sorry, no - in fact that's what I was using on Windows this week! (well, porting Python code to Dart) I just was 70% sure the commands as written would not work on Windows, and I suppose there's an implication I'm 100% sure they wouldn't work on not-Windows given the .bat file. Beyond that, nada.
>> separately, you misunderstand the quote re: glass houses
> No, I agree that you don't have to be 100% correct to accuse people of posting dumb stuff, but you shouldn't be posting dumb stuff either.
Intriguing, as always: "Did you write this with AI?" followed by a kind inquiry into the meaning of that, followed by "people in glass shouldn't throw stones" meant "you said something wrong when you said something else is wrong, but its cool, that's fine" - "shouldn't" seems to bely that interpretation, but I'm sure I have it wrong.
P.s. all the best, my friend. :)
>what's an "attention map"? with maximum charity, if we had some sort of attention-as-in-LLM-like structure precached, how would it apply beyond one model? By “attention map” I meant a visual representation of where a model can focuses its “attention” when analyzing an image — basically, a heatmap highlighting important regions that influence the model’s output. It isn't something that is very useful now but might be.
> You reinvented EXIF/JPEG metadata with naivete Partly true (at least for now) the core idea was to experiment with alternative metadata or feature embedding, not to replace well-established standards. It's not where I NEED it to be yet but as far as metadata usecases go, it's pretty cool.
> Your setup instructions use UNIX commands up until they require running a Windows batch file It's easier to set windows up to directly open other file formats, it's just a thing (and I'm on windows - so)
In my AI assisted coding I’ve started experimenting with embedding hyper-relevant context in comments; for example I embed test related context directly into test files, so it’s immediately available and fresh whenever the file is read.
Extrapolating, I’ve been thinking recently about whether it might be useful to design a programming language optimized for LLM use. Not a language to create LLMs, but a language optimized for LLMs to write in and to debug. The big obstacle would seem to be bootstrapping since LLMs are trained by analyzing large amounts of human created code.
Embedding metadata into the pixels by using the least significant bits of RGB won't cut it, that stuff is gone when the file becomes a JPEG.
But there do exist methods of embedding data in pixels that can survive JPEG compression.
given that, how are you handling tradeoffs between spatial fidelity (like masks or edges) versus scalar data (like complexity scores)? is there a priority system, or does it just truncate when it runs out?
640x480 a 3 bits -> 900kB.
Thats a lot. Too much. This is why we cant have nice things. Because people keep inventing things to destroy privacy.
Wouldn't the kind of metadata that's most useful be things that can't be synthesized, like labels or (for ai-generated images) the prompt used to generate the image?
It’s definitely not finished, more like a poc right now for storing richer, AI-relevant metadata in a portable way. Appreciate you for taking the time to check it out.
You mean PNG on steganoroids.
Also I also remember when I discovered steganography and tried putting it in everything. I was 13. Seriously, what's the point of that?
if you couldn't be bothered to write it, why should anyone read it, and what does that say about your view of the potential users you're trying to attract
Why not use PNG’s built-in zTXt chunks to store metadata? That seems like a more standard and less fragile approach.
I can see the case for using LSB steganography to watermark or cryptographically sign an image—but using it to embed all the metadata you're describing is likely to introduce a lot of visual noise.
Also worth considering: this approach could be used to poison models by embedding deliberately misleading metadata. Depending on your perspective, that might be a feature or a bug.
Instead of the vitriol and downvotes, maybe next time just point out “you can put arbitrary data in exif”.
But you missed half the point of the article, which was the EXTRA DATA to make the image more LLM-useful.
Thanks for the comment! Means a lot :)