That's awesome. DINOv2 was the best image embedder until now.
llm_nerd · 19m ago
You have to share your contact information, including DoB, and then be approved access, to obtain the models, and given that it's Meta I assume they're actually validating it against their All Humans database.
They made their own DINOv3 license for this release (whereas DINOv2 use the Apache 2.0 license).
Neat though. Will still check it out.
Qwuke · 7m ago
Yes, it's pretty disappointing for a seemingly big improvement over SOTA to be commercially licensed compared the previous version.. At least in the press release they're not portraying it as open source just because it's on GitHub/HuggingFace.
ranger_danger · 1h ago
I have no idea what this even is.
kaoD · 1h ago
> An extended family of versatile vision foundation models producing high-quality dense features and achieving outstanding performance on various vision tasks including outperforming the specialized state of the art across a broad range of settings, without fine-tuning
kevinventullo · 1h ago
To elaborate, this is a foundation model. This basically means it can take an arbitrary image and map it to a high dimensional space H in which ~arbitrary characteristics become much easier to solve for.
For example (and this might be oversimplifying a bit, computer vision people please correct me if I’m wrong) if you’re interested in knowing whether or not the image contains a cat, then maybe there is some hyperplane P in H for which images on one side of P do not contain a cat, and images on the other side do contain a cat. And so solving for “Does this image contain a cat?”becomes a much easier problem, all you have to do is figure out what P is. Once you do that, you can pass your image into DINO, dot product with the equation for P, and check whether the answer is negative or positive. The point is that finding P is much easier than training your own computer vision model from scratch.
reactordev · 30m ago
If computer vision were semantic search, nailed it. It’s a little more complicated than that but - with this new model, not by much :D
pugworthy · 3m ago
So, a group of AI models that can look at images and understand them, working well for many different tasks without needing extra training?
They made their own DINOv3 license for this release (whereas DINOv2 use the Apache 2.0 license).
Neat though. Will still check it out.
For example (and this might be oversimplifying a bit, computer vision people please correct me if I’m wrong) if you’re interested in knowing whether or not the image contains a cat, then maybe there is some hyperplane P in H for which images on one side of P do not contain a cat, and images on the other side do contain a cat. And so solving for “Does this image contain a cat?”becomes a much easier problem, all you have to do is figure out what P is. Once you do that, you can pass your image into DINO, dot product with the equation for P, and check whether the answer is negative or positive. The point is that finding P is much easier than training your own computer vision model from scratch.