Show HN: Meow – An Image File Format I made because PNGs and JPEGs suck for AI

83 kuberwastaken 71 6/15/2025, 12:26:55 PM github.com ↗
One of the biggest context AI LLMs can get from images is their metadata, but it's extremely underutilized. and while PNG and JPEG both offer metadata, it gets stripped way too easily when sharing and is extremely limited for AI based workflows and offer minimal metadata entries for things that are actually useful. Plus, these formats are ancient (1995 and 1992) - it's about time we get an upgrade for our AI era. Meet MEOW (Metadata-Encoded Optimized Webfile) - an Open Source Image file format which is basically PNG on steroids and what I also like to call the purr-fect file format.

Instead of storing metadata alongside the image where it can be lost, MEOW ENCODES it directly inside the image pixels using LSB steganography - hiding data in the least significant bits where your eyes can't tell the difference, this also doesn't increase the image size significantly. So if you use any form of lossless compression, it stays.

What I noticed was, Most "innovative" image file formats died because of lack of adoption, but MEOW is completely CROSS COMPATIBLE WITH PNGs You can quite literally rename a .MEOW file to a .PNG and open it in a normal image viewer.

Here's what gets baked right into every pixel:

- Edge Detection Maps - pre-computed boundaries so AI doesn't waste time figuring out where objects start and end.

- Texture Analysis Data - surface patterns, roughness, material properties already mapped out.

- Complexity Scores - tells AI models how much processing power different regions need.

- Attention Weight Maps - highlights where models should focus their compute (like faces, text, important objects)

- Object Relationship Data - spatial connections between detected elements.

- Future Proofing Space - reserved bits for whatever AI wants to add (or comments for training LORAs or labelling)

Of course, all of these are editable and configurable while surviving compression, sharing, even screenshot-and-repost cycles :p

When you convert ANY image format to .meow, it automatically generates most AI-specific features and data from what it sees in the image, which makes it work way better.

Would love thoughts, suggestions or ideas you all have for it :)

Comments (71)

ai_critic · 7h ago
Reality check:

Your extra data is a big JSON blob. Okay, fine.

File formats dating back to Targa (https://en.wikipedia.org/wiki/Truevision_TGA) support arbitrary text blobs if you're weird enough.

PNG itself has both EXIF data and a more general text chunk mechanism (both compressed and uncompressed, https://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.An... , section 4.2.3, you probably want iTXt chunks).

exiftool will already let you do all of this, by the way. There's no reason to summon non-standard file format into the world (especially when you're just making a weird version of PNG that won't survive resizing or quantization properly).

ai_critic · 7h ago
Here, two incantations:

> exiftool -config exiftool.config -overwrite_original -z '-_custom1<=meta.json' cat.png

and

> exiftool -config exiftool.config -G1 -Tag_custom1 cat.png

You can (with AI help no less) figure out what `exiftool.config` should look like. `meta.json` is just your JSON from github.

Now go draw the rest of the owl. :)

kuberwastaken · 5h ago
Hi! Thanks for checking it out, means a lot :)

Yes, it is a big JSON blob atm, haha and t's definitely still a POC, but the idea is to avoid having a separate JSON file that adds to the complexity. While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.

I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.

There's definitely a ton of work left to do, but I see a lot of potential in something like this (also, nice username)

ai_critic · 4h ago
> While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.

That's why I mentioned that you put anything, include binary data--which includes images--into the chunks in a PNG. I think Pillow even supports this (there are some PRs, like https://github.com/python-pillow/Pillow/pull/4292 , that suggest this).

Your problem domain is:

* Have something that looks like a PNG...

* ...that doesn't need supporting files outside itself...

* ...that can also store textual data (e.g., that JSON blob of bounding boxes and whatnot)...

* ...and can also store image data (e.g., attention maps and saliency regions).

What I'm telling you is that the PNG file format already supports all of this stuff, you just need to be smart enough to read the spec and apply the affordances it gives you.

> I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.

In the 90s, we'd already spent vast sums of gold and blood and tears solving the "holy shit, how do we encode multiple things in images so that they can survive an image pipeline, be extensible to end users, and be compressed reliably."

None of this has been new for three decades. Nothing you are going to do is going to be a value add over correctly using the file format you already have.

I promise that you aren't going to see anything particularly new or exciting in this AI goldrush that isn't an isomorphism of something much smarter, much better-paid people solved back when image formats were still a novel problem domain (again, in the 1990s).

vunderba · 4h ago
> it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.

Why not exactly? ComfyUI encodes an absolute bonker amount of information (all arbitrary JSON) into workflow PNG files without any issues.

ai_critic · 4h ago
Indeed. And character cards for chatbots (like in SillyTavern) have supported this for years.
gavinray · 7h ago
Maybe I'm jaded, but I fail to see how a bespoke file format is a better solution than bundling a normal image and a JSON/XML document containing metadata that adheres to a defined specification.

It feels like creating a custom format with backwards PNG compatibility and using steganography to cram metadata inside is an inefficient and over-engineered alternative to a .tar.gz with "image.png" and "metadata.json"

kuberwastaken · 5h ago
That's fair and how it's traditionally done but the entire idea of this was to have everything you need on the image itself and reduce the complexity and extra files, no risk of losing the JSON, mismatching versions, or needing extra packaging steps.

I'm working on redundancy and error correction to make it better!

CharlesW · 6h ago
> …creating a custom format with backwards PNG compatibility and using steganography to cram metadata inside is an inefficient and over-engineered alternative to a .tar.gz with "image.png" and "metadata.json"

So, "perfect Show HN"? ¯\_(ツ)_/¯

xhkkffbf · 6h ago
Yes, separate metadata has great advantages, but it can get separated from the main file pretty easily. Many social media platforms and email sites will let you embed PNG files. But they won't let you embed an image with a separate metadata file that's always kept along with it.

When images get loose in the wild, this can be very helpful.

a2128 · 7h ago
You're adding metadata, but what problems does this added metadata solve exactly? If your converter can automatically compute these image features, then AI training and inference pipelines can trivially do the same, so I don't see the point in needing a new file format that contains these.

Moreover, models and techniques get better over time, so these stored precomputed features are guaranteed to become obsolete. Even if they're there and it's simple to use in a pipeline and everybody is using this file format, pipelines still won't use it when they were precomputed years ago and state-of-the-art techniques give more accurate features.

jtsylve · 6h ago
The answer may be in your question.

- This is currently solved by inference pipelines. - Models and techniques improve over time.

The ability for different agents with different specialties to add additional content while being able to take advantage of existing context is what makes the pipeline work.

Storing that content in the format could allow us to continue to refine the information we get from the image over time. Each tool that touches the image can add new context or improve existing context and the image becomes more and more useful over time.

I like the idea.

kuberwastaken · 6h ago
Said it better than I could have

also, the idea is to integrate the conversion processes/ pipelines with other data that'll help with customized workflows.

ai_critic · 4h ago
> Each tool that touches the image can add new context or improve existing context and the image becomes more and more useful over time.

This is literally the problem solved by chunk-based file formats. "How do we use multiple authoring tools without stepping on each other" is a very old and solved problem.

RamblingCTO · 8h ago
> it gets stripped way too easily when sharing

that's not a bug, that's a (security) feature

TeMPOraL · 7h ago
It's a perfect illustration that security and usefulness are a tradeoff.
kuberwastaken · 6h ago
Sure is, but do we need that security feature that's common on the internet to have worse context when working with AI?
michaelt · 7h ago
If you're a journalist taking photos of eccentric semi-fugitive John McAfee - you want the location metadata removed when posting online, in case you forgot to remove it yourself.

If you're a proud generative AI user, and you don't want anyone deceived by your images, you want the this-was-created-by-ai metadata retained.

godelski · 5h ago
That's called a watermark
jbverschoor · 8h ago
Why not simply JXL? It has multiple channels, can store any metadata, is lossy/lossless.
spookie · 7h ago
Or even DDS.
DanHulton · 6h ago
You have invented essentially an _incredible way_ to poison AI image datasets.

Step 1: Create .meow images of vegetables, with "per-pixel metadata" instead encoded to represent human faces. Step 2: Get your images included in the data set of a generative image model. Step 3: Laugh uproariously as every image of a person has vaguely-to-profoundly vegetal features.

whoisyc · 4h ago
This assumes people training AI are going to put in the efforts to extract metadata from a poorly specified “format” with a barely coherent buzzword ridden README file. Realistically, they will just treat any .meow as opaque binary blobs and any png as regular png file.
zdw · 7h ago
It would be better to use this as an additional extension before the normal extension like other tools that embed additional metadata do.

For example, Draw.io can embed original diagrams in .svg and .png files, and the pre-suffix is .drawio.png or .drawio.png .

kuberwastaken · 6h ago
Hmm that's a great idea as well, I'll look into it, thank you :)
fao_ · 8h ago
> Instead of storing metadata alongside the image where it can be lost, MEOW ENCODES it directly inside the image pixels using LSB steganography

That makes the data much more fragile than metadata fields, though? Any kind of image alteration or re-encoding (which almost all sites do to ensure better compression — discord, imgur, et al) is going to trash the metadata or make it utterly useless.

I'll be honest, I don't see the need for synthesizing a "new image format" because "these formats are ancient (1995 and 1992) - it's about time we get an upgrade" and "metadata [...] gets stripped way too easily" when the replacement you are advocating not only is the exact same format as a PNG but the metadata embedding scheme is much more fragile in terms of metadata being stripped randomly when uploaded somewhere. This seems very bizarre to me and ill-thought-out.

Anyway, if you want a "new image format" because "the old ones were developed 30 years ago", there's a plethora of new image formats to choose from, that all support custom metadata. including: webp, jpeg 2000, HEIF, jpeg xl, farbfeld (the one the suckless guys made).

I'll be honest... this is one of the most irritating parts of the new AI trend. Everyone is an "ideas guy" when they start programming, it's fine and normal to come up with "new ideas" that "nobody else has ever thought of" when you're a green-eared beginner and utterly inexperienced. The irritating part is what happens after the ideas phase.

What used to happen was you'd talk about this cool idea in IRC and people would either help you make it, or they would explain why it wasn't necessarily a great idea, and either way you would learn something in the process. When I was 12 and new to programming, I had the "genius idea" that if we could only "reverse the hash algorithm output to it's input data" we would have the ultimate compression format... anyone with an inch of knowledge will smirk at this preposition! And so I learned from experts on why this was impossible, and not believing them, I did my own research, and learned some more :)

Nowadays, an AI will just run with whatever you say — "why yes if it were possible to reverse a hash algorithm to its input we would have the ultimate compression format", and then if you bully it further, it will even write (utterly useless!) code for you to do that, and no real learning is had in the process because there's nobody there to step in and explain why this is a bad idea. The AI will absolutely hype you up, and if it doesn't you learn to go to an AI that does. And now within a day or two you can go from having a useless idea, to advertising that useless idea to other people, and soon I imagine you'll be able to go from advertising that useless idea to other people, to manufacturing it IRL, and at no point are you learning or growing as a person or as a programmer. But you are wasting your own time and everyone else's time in the process (whereas before, no time was wasted because you would learn something before you invested a lot of time and effort, rather than after).

thinkingQueen · 7h ago
Exactly. Not long ago, someone showed up on Hacker News who had, on his own, begun to rediscover the benefits of arithmetic coding. Naturally, he was convinced he’d come up with a brand-new entropy coding method. Well, no harm done and it’s nice that people study compression but I was surpised how easily he got himself convinced of a discovery. Clearly he knew very little.

Overall, I think this is a positive ”problem” to have :-)

magicalhippo · 6h ago
I've had several revolutionary discoveries during my time programming. In each case, after the euphoria had settled a bit, I asked myself: Why aren't we already doing this? Why isn't this already a thing? What am I missing?

And lo and behold, in each case I did find that it was either not novel at all or it had some major downside I had initially missed.

Still, fun to think about new ways of doing things, so I still go at it.

whoisyc · 4h ago
> webp, jpeg 2000, HEIF, jpeg xl, farbfeld

I think you just illustrated how difficult it is to propose a new standard. Webp was not supported by many image related softwares (including the Adobe suite!) for years and earned a bad reputation, HEIF is also poorly supported, JPEG XL was removed from Chrome despite being developed by Google and not supported by any other browser AFAIK. Never heard of farbfeld before.

If the backing from Apple and Google was not enough to drive the adoption of an image format, I fail to see how this thing can go anywhere.

ahofmann · 8h ago
So converting the file to a lossy format, or resizing the image as png will destroy the encoded information? I see why one wants to use it, but I think it can be only useful in a controlled environment. As soon as someone else has access to the file, the information can easily get lost. Just like metadata.
bastawhiz · 5h ago
Modifying the image in any way (cropping, resizing, etc) destroys the metadata. This is necessary in basically every application that interacts with any kind of model that uses images, either for token count reasons, file size reasons, model limits, etc. (Source: I work at a genai startup)

At inference time, you don't control the inputs, so this is moot. At training time, you've already got lots of other metadata that you need to store and preserve that almost certainly won't fit in steganographically encoded format, and you've often got to manipulate the image before feeding it into your training pipeline. Most pipelines don't simply take arbitrary images (nor do you want them: plenty of images need to be modified to, for instance, remove letterboxing).

The other consideration is that steganography is actively introducing artifacts to your assets. If you're training on these images, you'll quickly find that your image generation model, for instance, cannot generate pure black. If you're adding what's effectively visual noise to every image you train on, the model will generate images with noise.

vunderba · 4h ago
Was just coming here to say this. Most graphic editors can easily preserve EXIF/IPTC data across edits.

Without an entirely dedicated editor or postprocessing plugin, stenography gets destroyed on modification.

moritzwarhier · 3h ago
Why not store metadata, along with a checksum of the png, in myPublicPhoto.png.meow?

Labeling and metadata a separate concerns. "Edge detection maps" etc are implementation details of whatever you are doing with image data, and quite likely to be non-portable.

And non-removability / steganography of additional metadata is not a selling point at all?

So my thoughts are, this violates separation of concerns and seems badly thought-out.

It also mangles labeling, metadata and technicalities, and attempts to predict future requirements.

I don't understand potential utility.

can16358p · 6h ago
Nice work!

Though I have one question: once 2 bits/channel are used with Meow-specific data thus leaving 6bits/channel, I doubt how it can still retain perfect image quality when either: (if everything's re-encoded) dynamic range is reduced by 75% or LSB changes introduce noise to the original image. Not too much noise, but still.

voxleone · 5h ago
Great idea and insight. If i understand, it will allow you to embed metadata such as bounding box coordinates and class names, something I have also been working on[0] -- embedding computer vision annotation data directly into an image's EXIF tags, rather than storing it in separate sidecar text files. The idea is simplifying the dataset's file structure. It could offer unexpected advantages — especially for smaller or proprietary datasets, or for fine-tuning tasks where managing separate annotation files adds unnecessary overhead.

[0] https://github.com/VoxleOne/XLabel

Edited for clarity

jfengel · 6h ago
I do like the idea of storing it steganographically, which also serves as a watermark.

But it requires a ton of redundancy and error correction, perhaps enough to survive a few rounds of not-too-lossy reencoding. I dunno how much bandwidth is available before it damages the image.

techjamie · 5h ago
I wonder how practical it'd be to stratanagraphically hide a QR code in an image and it be retained through rounds of JPEG compression. It could be represented with a single bit per pixel and would inherently have some resistance to corruption.

The real limitation would be bandwidth.

kuberwastaken · 6h ago
Great point, I like that idea too: I’ll definitely look into adding redundancy and testing how much re-encoding it can realistically handle without noticeable image damage. Thanks for taking the time to check it out :)
vunderba · 4h ago
The amount of information you can encode using EXIF/IPTC doesn't have an upper-bound the way that using stenography is inherently capped by the resolution of the image. What happens when you want to encode more information using the MEOW format than you have "pixels" (which seems like a very real possibility with thumbnail or smaller pictures)?
daeken · 6h ago
> Python-based image file format

This is one of the first lines of the readme. But this is PNG with some metadata encoded using the most naive steganographic technique (just thrown into the LSB of pixels -- no redundancy, no error correction, no compensation for down sampling, etc). Even ignoring everything else, this is just ... Nonsensical.

I am very very pro-AI. But this is slop.

kuberwastaken · 6h ago
That's fair, the idea is to have something like this in practice and this is something I think that can be iterated upon to be actually more useful.
jpollock · 8h ago
Cool idea. I can see it being useful in a pipeline, where you mutate the image as you go. Losing referenced data can be a pain. Are you able to extract the original image?
im3w1l · 8h ago
No one runs an edge detection first then sends the image as a screenshot and then trains ai on it. That's an absurd workflow.

Maybe your format could have some use, but I don't find your motivation convincing.

kuberwastaken · 6h ago
True, it's probably the wording but I meant you could screenshot it and still have the data itself.

> Maybe your format could have some use, but I don't find your motivation convincing.

That's respectable and it very much still is a POC, I hope to keep working on it to be actually great :)

refulgentis · 7h ago
You generated pretty much ~all of this with Claude (c.f. ASCII diagrams with emojis on each line to "prove" various not-even-wrong claims it was told to justify), and the work is mediocre enough that it's worth full-throatedly criticizing both the work quality and that you inflicted this upon the world.

Look how many confused comments there are due to the page claiming features you don't have, don't understand, and don't make sense on their own terms (what's an "attention map"? with maximum charity, if we had some sort of attention-as-in-LLM-like structure precached, how would it apply beyond one model? how big would the image be? is it possible to fit that in the 2 bits we claim to fit in every 4 bytes)

I don't want for you to take it personally, at all, but I never, ever, want to see something like this on the front page again.

You've reinvented EXIF and JPEG metadata, in the voice of a diligent teenager desiring to create something meaningful, but with 0 understanding of the computing layers, 4 hours with Wikipedia, and 0 intellectual humility - though, with youth, born not from obstinance, but naiveté.

Some warning signs you should have taken heed of:

- Metadata is universally desirable, yet, somehow unexplored until now?

- Your setup instructions use UNIX commands up until they require running a Windows batch file

- The example of hiding data hides it in 2 bits in a channel then "demonstrates" this is visually lossless because its hidden in 1 bit across 2 channels (it isn't, because if it was, how would we determine which 2 of the channels?) ("visually lossless" confuses "lossless", a technical term meaning no information was lost with a weaker claim of being lossy-but-not-detectably-so)

I'll leave it here, I think you have the idea and there's a difference between being firm and honest, and being cruel, and length will dictate a lot of that to a casual observer.

gruez · 6h ago
>You generated pretty much ~all of this with Claude

>- Your setup instructions use UNIX commands up until they require running a Windows batch file

Is your comment AI generated? The only setup instructions prior to the windows commands are "git", "cd", and "pip". "cd" exists on both windows and unix. The other commands might not be available by default on windows, but they're not exactly "UNIX" commands either. The other code blocks mostly seem to be assuming windows (eg. "start" or "copy" command), so I don't see any contradictions here.

refulgentis · 6h ago
> Is your comment AI generated?

Are you asking this earnestly, or, is it meant to communicate something else? If so, what? :)

Genuinely, the most interesting part of the comment to me, in that it is does not have 0 meaning, and rings of some form of frustration, yet the rest of your comment stays focused on technical knowledge, and AFAIK you are not the author (who I'd expect would be at least temporarily angry at my contribution)

gruez · 6h ago
>Are you asking this earnestly, or, is it meant to communicate something else? If so, what? :)

If you're going to accuse some else of technical inconsistencies, maybe you should make sure your critiques are free of technical inconsistencies as well. You know, "people who live in glass houses shouldn't throw stones" and all that.

refulgentis · 6h ago
There's a false equivalence there, between being not-even-wrong and "you have a bunch of UNIX commands followed by a Windows batch file execution."

Note we both agree on that, you seem to assume I claimed something else, like, cd doesn't exist on windows.

Let's say I instead said "this doesn't work on Windows"

I spent probably...8 hours? on Windows this week doing dev, and I'm about 70% sure all of those commands will work on Windows, with dev mode switched on, with WSL on, prereqs installed...

Let's steelman this to the max: any possible prerequisite that could block it, doesn't mean its actually blocked. Dev mode on, WSL, prerequisites wrestled with and installed, can download source and edit then compile, but can only patch build errors, not add new functionality.

Are you 100% sure those commands will work?

(separately, you misunderstand the quote re: glass houses. It would apply if I had used AI to write not-even-wrong claims and then submitted to HN. This misunderstanding leads to a conclusion that it is impermissible to comment on the correctness of anything if you may be incorrect, which we can both recognize leads to absurdities that would lead to 0 communication ever.)

gruez · 5h ago
>There's a false equivalence there, between being not-even-wrong and "you have a bunch of UNIX commands followed by a Windows batch file execution."

>Note we both agree on that, you seem to assume I claimed something else, like, cd doesn't exist on windows.

No, you made a specific claim of "Your setup instructions use UNIX commands up until they require running a Windows batch file", when those "UNIX commands" were "pip" and "python". That statement is incorrect because those commands are readily available on windows.

Your remark about "you seem to assume I claimed something else, like, cd doesn't exist on windows" is absurd at best and verges on bad faith that I'm not even going to engage with it.

>I spent probably...8 hours? on Windows this week doing dev, and I'm about 70% sure all of those commands will work on Windows, with dev mode switched on, with WSL on, prereqs installed...

Which commands are those? The only non-native windows commands I see are git, pip, and python, the latter of which are both included in python. You're making it sound like you need to jump through a bunch of hoops to get those commands working, when really all you have to do is run the installers for git and python.

>Are you 100% sure those commands will work?

Again, my claim isn't that the project works 100%, or even that it's not AI generated, it's that your critique makes little sense either.

>(separately, you misunderstand the quote re: glass houses. It would apply if I had used AI to write not-even-wrong claims and then submitted to HN. This misunderstanding leads to a conclusion that it is impermissible to comment on the correctness of anything if you may be incorrect, which we can both recognize leads to absurdities that would lead to 0 communication ever.)

No, the reason why I accused you of AI generated comments and made the remark about glass houses is that claiming "pip" and "python" are "UNIX commands" is so absurdly wrong that it's on the level of the OP. I agree that you don't have to be 100% correct to accuse people of posting dumb stuff, but you shouldn't be posting dumb stuff either.

refulgentis · 4h ago
> Your remark about "you seem to assume I claimed something else, like, cd doesn't exist on windows" is absurd at best and verges on bad faith that I'm not even going to engage with it.

You seem very upset, at least, I'm not used to people being this aggressive on HN, and I've been here for 15 years. I apologize for my contribution to that, if not my sole responsibility for it.

I remain fascinated by your process, I never have heard bad faith invoked when someone points at their actual words.

Generally, it is rare someone invokes "bad faith" when someone else's thoughts don't match their expectations.

I just...can't lie to you. I can't claim I thought it wouldn't work on Windows. I thought the opposite! That the sequence had 0% chance of working on not-Windows, and a 70% chance of working on Windows.

>> Are you 100% sure those commands will work? > Again, my claim isn't that the project works 100%, or even that it's not AI generated,

Oh! I'm referring to the commands, not the project :) The project can output "APRIL FOOLS!", as far as I care for this exercise.

> it's that your critique makes little sense either.

Oh, interesting - happy to hear more beyond that I must have meant pip/Python aren't available on Windows. If that's your sole issue, well, more power to you :) I do want to avoid lying to you just to avoid an aggressive conversation, you may not be even meaning to be aggressive. With the principle of "don't lie", I can't say I had something else in my head that matches your understanding so far, I presume something like "They are UNIX commands follows by Windows commands" [and thus this won't work on Windows]

> claiming "pip" and "python" are "UNIX commands"

Do you think I thought pip/Python wasn't on Windows? Sorry, no - in fact that's what I was using on Windows this week! (well, porting Python code to Dart) I just was 70% sure the commands as written would not work on Windows, and I suppose there's an implication I'm 100% sure they wouldn't work on not-Windows given the .bat file. Beyond that, nada.

>> separately, you misunderstand the quote re: glass houses

> No, I agree that you don't have to be 100% correct to accuse people of posting dumb stuff, but you shouldn't be posting dumb stuff either.

Intriguing, as always: "Did you write this with AI?" followed by a kind inquiry into the meaning of that, followed by "people in glass shouldn't throw stones" meant "you said something wrong when you said something else is wrong, but its cool, that's fine" - "shouldn't" seems to bely that interpretation, but I'm sure I have it wrong.

P.s. all the best, my friend. :)

kuberwastaken · 6h ago
> You generated pretty much ~all of this with Claude Haha no, it was a reworked version of an older image format I found that I modified to fit this, yes, there was AI assisted coding involved in the process but it wasn't a "make me an image format that does x"

>what's an "attention map"? with maximum charity, if we had some sort of attention-as-in-LLM-like structure precached, how would it apply beyond one model? By “attention map” I meant a visual representation of where a model can focuses its “attention” when analyzing an image — basically, a heatmap highlighting important regions that influence the model’s output. It isn't something that is very useful now but might be.

> You reinvented EXIF/JPEG metadata with naivete Partly true (at least for now) the core idea was to experiment with alternative metadata or feature embedding, not to replace well-established standards. It's not where I NEED it to be yet but as far as metadata usecases go, it's pretty cool.

> Your setup instructions use UNIX commands up until they require running a Windows batch file It's easier to set windows up to directly open other file formats, it's just a thing (and I'm on windows - so)

efitz · 5h ago
I think that this is interesting research. As LLMs are becoming an important part of building stuff, I suspect that we will find that embedding context close to where it’s needed will yield better results in longer or more complex workflows.

In my AI assisted coding I’ve started experimenting with embedding hyper-relevant context in comments; for example I embed test related context directly into test files, so it’s immediately available and fresh whenever the file is read.

Extrapolating, I’ve been thinking recently about whether it might be useful to design a programming language optimized for LLM use. Not a language to create LLMs, but a language optimized for LLMs to write in and to debug. The big obstacle would seem to be bootstrapping since LLMs are trained by analyzing large amounts of human created code.

kookamamie · 6h ago
This is not a good idea in practice. Why not bundle the metadata as JSON or Protobuf via an aux file?
kuberwastaken · 6h ago
That's how it's usually done. The main reason I tried embedding it directly was to make the file self-contained, so that context always travels with the image itself and it's less tedious.
Dwedit · 6h ago
Metadata gets stripped by most websites.

Embedding metadata into the pixels by using the least significant bits of RGB won't cut it, that stuff is gone when the file becomes a JPEG.

But there do exist methods of embedding data in pixels that can survive JPEG compression.

b0a04gl · 6h ago
using LSB to store structured metadata inside PNG is clever > survives format conversion, stays invisible to standard viewers, and doesn't break compression. but the space is tight. even at 1 bit per channel, that's just 3 bits per pixel on RGB.

given that, how are you handling tradeoffs between spatial fidelity (like masks or edges) versus scalar data (like complexity scores)? is there a priority system, or does it just truncate when it runs out?

mousethatroared · 5h ago
3 bits per pixel is in the order of megabyte amounts of information.

640x480 a 3 bits -> 900kB.

Thats a lot. Too much. This is why we cant have nice things. Because people keep inventing things to destroy privacy.

hiccuphippo · 6h ago
Does this survive resizing images or converting from png to jpg? (or worse, taking jpg screenshots of resized pngs). Because that also happens a lot when sharing images.
kevingadd · 8h ago
I don't understand the purpose of effectively hard-coding things like edge detection and attention weight maps into the image. There are various ways to do edge detection and various ways to focus attention, so having that fixed and encoded into the image instead of synthesizing it on demand to suit your particular ends seems suboptimal.

Wouldn't the kind of metadata that's most useful be things that can't be synthesized, like labels or (for ai-generated images) the prompt used to generate the image?

kuberwastaken · 6h ago
Totally fair points, but the idea isn’t to stop at edge maps or simple overlays. This was meant as an early step toward expanding what an image can carry with it for AI workflows.

It’s definitely not finished, more like a poc right now for storing richer, AI-relevant metadata in a portable way. Appreciate you for taking the time to check it out.

nottorp · 4h ago
How about a format that will break AI instead?
layer8 · 6h ago
> PNG on steroids

You mean PNG on steganoroids.

vrighter · 7h ago
So you do a bunch of the network's job for it?

Also I also remember when I discovered steganography and tried putting it in everything. I was 13. Seriously, what's the point of that?

evertedsphere · 5h ago
please. no more chatgpt-generated readmes upvoted to the front page

if you couldn't be bothered to write it, why should anyone read it, and what does that say about your view of the potential users you're trying to attract

nnunley · 6h ago
Is the main goal here just to have a cool-looking file extension?

Why not use PNG’s built-in zTXt chunks to store metadata? That seems like a more standard and less fragile approach.

I can see the case for using LSB steganography to watermark or cryptographically sign an image—but using it to embed all the metadata you're describing is likely to introduce a lot of visual noise.

Also worth considering: this approach could be used to poison models by embedding deliberately misleading metadata. Depending on your perspective, that might be a feature or a bug.

efitz · 4h ago
Wow so much hate on this article.

Instead of the vitriol and downvotes, maybe next time just point out “you can put arbitrary data in exif”.

But you missed half the point of the article, which was the EXTRA DATA to make the image more LLM-useful.

kuberwastaken · 4h ago
Haha, that's totally alright honestly, found a couple suggestions that could actually give me constructive criticism to improve it, ignored the ones that hated without reason - I semi-expected this reaction :p

Thanks for the comment! Means a lot :)

vrighter · 3h ago
you could add anything in a png, and encoders should, unless otherwise instructed, preserve any chunks whose format it doesn't understand
seebeen · 6h ago
You do know that EXIF exists?
drivingmenuts · 6h ago
You now have X+1 problems …