Show HN: Open-source AI image/deepfake detection that actually works
YCW24 company here, we just open-sourced an AI image detection model that beats the SOTA commercial detectors. AI-generated images/videos have become incredibly good in the last few months and are flooding the internet; being able to detect them reliably gives some power back to consumers and companies that care about high-quality, genuine content.
Detecting AI-generated images is a very hard problem: there are many different techniques to generate images; there is image compression, noise, and other distortions that destroy generator artifacts; there's android phones applying auto-correction to images; etc. And none of the detectors we tried (sightengine.com, decopy.ai, etc.) works reliably even for basic examples (try it out with pirate Rick Astley made with Flux Kontext: https://imgur.com/a/iL3paE8).
We released two models, the full version (~600M params) and a smaller version (~20M params) that can even run in your browser on mobile (see demo)! We've also put up code for running things locally or via an API (free but rate-limited) using javascript/node and python code.
The full model was trained on 1M+ images that were scraped off the internet and the small model is a distillation. We're actively working on extending the dataset and further improving the models.
Classification accuracy: sightengine.com seems to be the best commercial solution out there, as confirmed by this (https://arxiv.org/pdf/2404.14581) paper, which they also cite on their website. Of course, they cherry-picked the results and claim 98.3% accuracy while only achieving (still impressive) ~82.8% over the full dataset. I've downloaded the dataset used in the paper and tested my models against it. The code for running the tests as well as a usable version of the dataset (the original was a big pain to download from OneDrive) are included in the repo code.
Here are our benchmark results for comparison: Total samples: 144,088 Real images: 17,044 Synthetic images: 127,044 Average precision: 0.991
================ threshold: 0.5 ================ Total accuracy: 0.864
PER-CATEGORY ACCURACY: Real 0.875 (17,044 samples) DALL-E_T2I 0.982 (16,110 samples) DreamStudio_T2I 0.968 (16,278 samples) Midjourney_T2I 0.961 (16,148 samples) StarryAI_T2I 0.847 (13,515 samples) DALL-E_IT2I 0.774 (16,665 samples) DreamStudio_IT2I 0.666 (16,139 samples) Midjourney_IT2I 0.897 (15,371 samples) StarryAI_IT2I 0.805 (16,818 samples) ================ threshold: 0.65 ================ Total accuracy: 0.827
PER-CATEGORY ACCURACY: Real 0.914 (17,044 samples) DALL-E_T2I 0.971 (16,110 samples) DreamStudio_T2I 0.949 (16,278 samples) Midjourney_T2I 0.940 (16,148 samples) StarryAI_T2I 0.803 (13,515 samples) DALL-E_IT2I 0.698 (16,665 samples) DreamStudio_IT2I 0.576 (16,139 samples) Midjourney_IT2I 0.845 (15,371 samples) StarryAI_IT2I 0.743 (16,818 samples)
No comments yet