Create and edit images with Gemini 2.0 in preview

218 meetpateltech 93 5/7/2025, 4:06:44 PM developers.googleblog.com ↗

Comments (93)

vunderba · 16h ago
I've added/tested this multimodal Gemini 2.0 to my shoot-out of SOTA image gen models (OpenAI 4o, Midjourney 7, Flux, etc.) which contains a collection of increasingly difficult prompts.

https://genai-showdown.specr.net

I don't know how much of Google's original Imagen 3.0 is incorporated into this new model, but the overall aesthetic quality seems to be unfortunately significantly worse.

The big "wins" are:

- Multimodal aspect in trying to keep parity with OpenAI's offerings.

- An order of magnitude faster than OpenAI 4o image gen

ticulatedspline · 10h ago
Excellent site! OpenAI 4o is more than mildly frighting in it's capabilities to understand the prompt. Seems mostly what's holding it back is a tendency away from photo-realism (or even typical digital art styles) and it's own safeguards.
troupo · 6h ago
I also find it weird how it defaults/devolves into this overall brown-ish style. Once you see it, you see it everywhere
flir · 14m ago
I've played around with "create an image based on this image" chains quite a lot, and yep, everything goes brown with 4o. You append the images to each other as a filmstrip and it's almost like a gradient.

They also simplify over the generations (eg a basket full of stuff slowly loses the stuff), but I guess that's to be expected.

avereveard · 9h ago
It's a bit expensive/slow but for styled request I let it do the base image and when happy with the composition I ask to remake it as a picture or in whatever style needed.
echelon · 9h ago
Multimodal is the only image generation modality that matters going forward. Flux, HiDream, Stable Diffusion, and the like are going to be relegated to the past once multimodal becomes more common. Text-to-image sucks, and image-to-image with all the ControlNets and Comfy nodes is cumbersome in comparison to true multimodal instructiveness.

I hope that we get an open weights multimodal image gen model. I'm slightly concerned that if these things take tens to hundreds of millions of dollars to train, that only Google and OpenAI will provide them.

That said, the one weakness in multimodal models is that they don't let you structure the outputs yet. Multimodal + ControlNets would fix that, and that would be like literally painting with the mind.

The future, when these models are deeply refined and perfected, is going to be wild.

zaptrem · 7h ago
Good chance a future llama will output image tokens
echelon · 6h ago
That's my hope. That Llama or Qwen bring multimodal image generation capabilities to open source so we're not left in the dark.

If that happens, then I'm sure we'll see slimmer multimodal models over the course of the next year or so. And that teams like Black Forest Labs will make more focused and performant multimodal variants.

We need the incredible instructivity of multimodality. That's without question. But we also need to be able to fine tune, use ControlNets to guide diffusion, and to compose these into workflows.

andybak · 2h ago
Any thoughts on how Ideogram would rank? I've not used it recently but I used to get the sense that it is (or was) a "contender".
esperent · 10h ago
Your site is really useful, thanks for sharing. One issue is that the list of examples sticks to the top and covers more than half of the screen on mobile, could you add a way to hide it?

If you're looking for other suggestions a summary table showing which models are ahead would be great.

vunderba · 7h ago
Great point - when I started building it I think I only had about four test cases, but now the nav bar is eating 50% of the vertical display so I've removed it from mobile display!

Wrt to the summary table, did you have a different metric in mind? The top of the display should already be showing a "Model Performance" chart with OpenAI 4 and Google Imagen 3 leading the pack.

esperent · 3h ago
That's much easy to read now.

> The top of the display should already be showing a "Model Performance" chart

I guess I missed this earlier!

liuliu · 9h ago
Do you mind to share which HiDream-I1 model you are using? I am getting better results with these prompts from mine implementation inside Draw Things.
vunderba · 7h ago
Sure - I was using "hidream-l1-dev" but if you're seeing better results - I might rerun the hidream tests with the "hidream-i1-full" model.

I've been thinking about possibly rerunning the Flux Dev prompts using the 1.1 Pro but I liked having a base reference for images that can be generated on consumer hardware.

pkulak · 12h ago
> That mermaid was quite the saucy tart.

Really now?

belter · 15h ago
Your shoot-out site is very useful. Could I suggest adding prompts that expose common failure modes?

For example, asking the models to show clocks set to a specific time or people drawing with their left hand. I think most, if not all models, will likely display every clock with the same time...And portray subjects drawing with their right hand.

vunderba · 13h ago
@belter / @crooked-v

Thanks for the suggestions. Most of the current prompts are a result of personal images that I wanted to generate - so I'll try to add some "classic GenAI failure modes". Musical instruments such as pianos also used to be a pretty big failure point as well.

troupo · 6h ago
For personal images I often play with wooly mammoths, and most models are incapable of generating anything but textbook images. Any deviation either becomes an elephant or an abomination (bull- or bear-like monsters)
crooked-v · 15h ago
Another I would suggest is buildings with specific unusual proportions and details(e.g. "the mansion's west wing is twice the height of the right wing and has only very wide windows"). I've yet to find a model that will do that kind of thing reliably, where it seems to just fall back on the vibes of whatever painting or book cover is vaguely similar to what's described.
droopyEyelids · 14h ago
generating a simple maze for kids is also not possible yet
vunderba · 13h ago
Love this one so I've added it. The concept is very easy for most GenAI models to grasp, but it requires a strong overall cohesive understanding. Rather unbelievably, OpenAI 4o managed to produce a pass.

I should also add an image that is heavy with "greebles". GenAI usually lacks the fidelity for these kinds of minor details so although it adds them - they tend to fall apart at more than a cursory examination.

https://en.wikipedia.org/wiki/Greeble

simonw · 13h ago
Be a bit careful playing with this one. I tried this:

  curl -s -X POST \
    "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key=$GEMINI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Provide a vegetarian recipe for butter chicken but with chickpeas not chicken and include many inline illustrations along the way"}
        ]
      }],
      "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
    }' > /tmp/out.json
And got back 41MB of JSON with 28 base64 images in it: https://gist.github.com/simonw/55894032b2c60b35f320b6a166ded...

At 4c per image that's more than a dollar on that single prompt.

I built this quick tool https://tools.simonwillison.net/gemini-image-json for pasting that JSON into to see it rendered.

weird-eye-issue · 11h ago
I mean you did ask for "many illustrations"
eminence32 · 18h ago
This seems neat, I guess. But whenever I try tools like this, I often run into the limits of what I can describe in words. I might try something like "Add some clutter to the desk, including stacks of paper and notebooks" but when it doesn't quite look like what I want, I'm not sure what else to do except try slightly different wordings until the output happens to land on what I want.

I'm sure part of this is a lack of imagination on my part about how to describe the vague image in my own head. But I guess I have a lot of doubts about using a conversational interface for this kind of stuff

monster_truck · 18h ago
Chucking images at any model that supports image input and asking it to describe specific areas/things 'in extreme detail' is a decent way to get an idea of what its expecting vs what you want.
thornewolf · 17h ago
+1 to this flow. I use the exact same phrase "in extreme detail" as well haha. Additionally, I ask the model to describe what prompt it might write to produce some edit itself.
crooked-v · 17h ago
I just tried a couple of cases that ChatGPT is bad at (reproducing certain scenes/setpieces from classic tabletop RPG adventures, like the weird pyramid from classic D&D B4 The Lost City), and Gemini fails in just about the same way of getting architectural proportions and scenery details wrong even when given simple, broad rules about them. Adding more detail seems kind of pointless when it can't even get basics like "creature X is about as tall as the building around it" or "the pyramid is surrounded by ruined buildings" right.
BoorishBears · 15h ago
What's an example of a prompt you tried and it failed on?
metalrain · 7h ago
Exactly, describing more complex compositions, lighting, image enchancements/filters there is so many things you know how it looks but to describe it such that LLM gets it and will reproduce it is pretty difficult.

Sometimes sketching it could be helpful, but more abstract technical thing like LUTs, feels still out of reach.

qoez · 17h ago
Maybe that's how the future will unfold. There will be subtle things AI fails to learn, and there will be differences in skills in how good people are at making AI do things, which will be a new skill in itself and will end up being determining difference in pay in the future.
gowld · 11h ago
This is "Prompt Engineering"
betterThanTexas · 15h ago
> I'm sure part of this is a lack of imagination on my part about how to describe the vague image in my own head.

This is more related to our ability to articulate than is easy to demonstrate, in my experience. I can certainly produce images in my head I have difficulty reproducing well and consistently via linguistic description.

SketchySeaBeast · 14h ago
It's almost as if being able to create art accurate to our mental vision requires practice and skill, be it the ability to create an image or to write it and invoke an imagine in others.
betterThanTexas · 14h ago
Absolutely! But this was surprising to me—my intuition says if I can firmly visualize something, I should be able to describe it. I think many people have this assumption and it's responsible for a lot of projection in our social lives.
SketchySeaBeast · 14h ago
Yeah, it's probably a good argument for having people try some form of art, to have them understand that their intent and their outcome is rarely the same.
xbmcuser · 16h ago
ask Gemini to word your thoughts better then use those to do the image editing.
Nevermark · 16h ago
Perhaps describe the types and styles of work associated with the desk, to create a coherent character to the clutter
bufferoverflow · 12h ago
In that scenario, if you can't describe what you want with words, a human designer can't read your mind either.
Hasnep · 11h ago
No, but a good designer will be able to help you put what you want into words.
gowld · 11h ago
Ask the AI to help you put what you want into words.
maksimur · 4h ago
I think the issue with AI (in contrast to human interaction) is its lack of real-time responsiveness. This slower back-and-forth can lead to frustration, especially if it takes a dozen or more messages to get the point across. Humans are also helped in helping you by contextual cues like gestures, facial expressions or "shared qualia".
zoogeny · 16h ago
I would politely suggest you work at getting better at this since it would be a pretty important skill in a world where a lot of creative work is done by AI.

As some have mentioned, LLMs are treasure troves of information for learning how to prompt the LLM. One thing to get over is a fear of embarrassment in what you say to the LLM. Just write a stream of consciousness to the LLM about what you want and ask it to generate a prompt based on that. "I have an image that I am trying to get an image LLM to add some clutter to. But when I ask it to do it, like I say add some stack of paper and notebooks, but it doesn't look like I want because they are neat stacks of paper. What I want is a desk that kind of looks like it has been worked at for a while by a typical office worker, like at the end of the day with a half empty coffee cup and .... ". Just ramble away and then ask the LLM to give you the best prompt. And if it doesn't work, literally go back to the same message chain and say "I tried that prompt and it was [better|worse] than before because ...".

This is one of those opportunities where life is giving you an option: give up or learn. Choose wisely.

cush · 17h ago

No comments yet

mkl · 9h ago
> what the lamp from the second image would look like on the desk from the first image

The lamp is put on a different desk in a totally different room, with AI mush in the foreground. Props for not cherry-picking a first example, I guess. The sofa colour one is somehow much better, with a less specific instruction.

cyral · 7h ago
That one is an odd example.. especially since image #3 does a similar task with excellent accuracy in keeping the old image intact. I've had the same issues when trying to make it visualize adding decor, it ends up changing the whole room or furniture materials.
cthulberg · 1h ago
gemini-2.0-flash-*-image-generation models are not currently supported in a number of countries in Europe, Middle East & Africa

source: https://ai.google.dev/gemini-api/docs/models#gemini-2.0-flas... and my Google Ai Studio

voidUpdate · 3h ago
1 doesn't actually really show how the lamp would look in that situation... in the first image it's about the same height as the sofa. I'd expect it to be at least twice the size that it is in the second image. Also what is going on underneath the table?
yots · 43m ago
That table image is... horribly honest? I'm a bit shocked they used it in a blog post. It's really, really bad
mastazi · 3h ago
LLMs are notoriously bad at estimating or comparing the size of physical objects due to the fact that their training happens away from the physical world, it is a well known problem and recently it has been discussed even in popular media e.g. https://theconversation.com/why-ai-cant-ever-reach-its-full-...
voidUpdate · 3h ago
Probably not the best idea to try and use that as a demo of how awesome your new image generator is then
minimaxir · 17h ago
Of note is that the per-image pricing for Gemini 2.0 image generation is $0.039 per image, which is more expensive than Imagen 3 ($0.03 per image): https://ai.google.dev/gemini-api/docs/pricing

The main difference is that Gemini does allow for incorporating a conversation to generate the image as demoed here, while Imagen 3 is a strict text-in/image-out with optional mask-constrained edits but likely allows for higher-quality images overall if skilled with prompt engineering. This is a nuance that is annoying to differentiate.

vunderba · 16h ago
Anecdotal but from preliminary sandbox testing side-by-side with Gemini 2.0 Flash and Imagen 3.0 - it definitely appears that that is the case - higher overall visual quality from Imagen 3.
ipsum2 · 17h ago
> likely allows for higher-quality images overall

What makes you say that?

Yiling-J · 12h ago
I generated 100 recipes with images using gemini-2.0-flash and gemini-2.0-flash-exp-image-generation as a demo of text+image generation in my open-source project: https://github.com/Yiling-J/tablepilot/tree/main/examples/10...

You can see the full table with images here: https://tabulator-ai.notion.site/1df2066c65b580e9ad76dbd12ae...

I think the results came out quiet well. Be aware I don't generate a text prompt based on row data for image generation. Instead, the raw row data(ingredients, instructions...) and table metadata(column names and descriptions) are sent directly to gemini-2.0-flash-exp-image-generation.

thornewolf · 18h ago
Model outputs look good-ish. I think they are neat. I updated my recent hack project https://lifestyle.photo to the new model. It's middling-to-good.

There are a lot of failure modes still but what I want is a very large cookbook showing what known-good workflows are. Since this is just so directly downstream of (limited) training data it might be that I am just prompting in a ever so slightly bad way.

sigmaisaletter · 17h ago
Re your project: I'd expect at least the demo to not have an obvious flaw. The "lifestyle" version of your bag has a handle that is nearly twice as long as the "product" version.
thornewolf · 17h ago
This is a fair critique. While I am merely a "LLM wrapper", I should put the product's best foot forward and pay more attention to my showcase examples.
nico · 17h ago
Love your project, great application of gen AI, very straightforward value proposition, excellent and clear messaging

Very well done!

thornewolf · 17h ago
Thank you for the kind words! I am looking forward to creating a Show HN next week alongside a Product Hunt announcement. I appreciate any and all feedback. You can provide it through the website directly or through the email I have attached in my bio.
mNovak · 17h ago
I'm getting mixed results with the co-drawing demo, in terms of understanding what stick figures are, which seems pretty important for the 99% of us who can't draw a realistic human. I was hoping to sketch a scene, and let the model "inflate" it, but I ended up with 3D rendered stick figures.

Seems to help if you explicitly describe the scene, but then the drawing-along aspect seem relatively pointless.

pentagrama · 16h ago
I want to take a step back and reflect on what this actually shows us. Look at the examples Google provides: it refers to the generated objects as "products", clearly pointing toward shopping or e-commerce use cases.

It seems like the real goal here, for Google and other AI companies, is a world flooded with endless AI-generated variants of objects that don’t even exist yet, crafted to be sold and marketed (probably by AI too) to hyper-targeted audiences. This feels like an incoming wave of "AI slop", mass-produced synthetic content, crashing against the small island of genuine human craftsmanship and real, existing objects.

hapticmonkey · 5h ago
It's sort of sad how these tools went from "godlike new era of human civilization" to "some commodity tools for marketing teams to sell stuff".

I get that they are trying to find some practical used cases for their tools. But there's no enlightenment in the product development here.

If this is already the part of the s-curve where these AI tools get diminishing returns...what a waste of everybody's time.

nly · 12h ago
Recently I've been seeing a lot of holiday lets on sites like Rightmove (UK) and Airbnb with clearly AI generated 'enhancements' to the photos.

It should be illegal in my view.

vunderba · 16h ago
Yeah - and honestly I don't really get this. Using GenAI for real-world products seems like a recipe for a slew of incoming fraudulent advertising lawsuits if the images are slightly different from the actual physical products yet presented as if they are actual real photographs.
nkozyra · 15h ago
The gating factor here is the pool of consumers. Once people have slop exhaustion there's nobody to sell this to.

Maybe this is why all of the future AI fiction has people dressed in the same bland clothing.

ohadron · 18h ago
For one thing, it's way faster than the OpenAI equivalent in a way that might unlock additional use cases.
freedomben · 18h ago
Speed has been the consistent thing I've noticed with Gemini too, even going back to the earlier days when Gemini was a bit of a laughing stock. Gemini is fast
julianeon · 16h ago
I don't know exactly the speed/quality tradeoff but I'll tell you this: Google may be erring too much on the speed side. It's fast but junk. I suspect a lot of people try it then bounce off back to Midjourney, like I did.
Tsarp · 9h ago
There are direct prompt tests and then there are tests with tooling.

If for example you use controlnets you can pretty much get very close to a style composition that you need with an open model like Flux that will be far better. Flux has a few successors coming up now

egamirorrim · 18h ago
I don't understand how to use this, I keep trying to edit a photo (change a jacket to a t-shirt) of myself in the Gemini app with 2.0 flash selected and it just generated a new image that's nothing like the original
FergusArgyll · 17h ago
I think this is just in AI Studio. In the Gemini app I think it goes: Flash describes the image to imagen -> imagen generates a new image
thornewolf · 18h ago
It is very sensitive to your input prompts. Minor differences will result in drastic quality differences.
julianeon · 16h ago
Remember you are paying about 4 cents an image if I'm understanding the pricing correctly.
emporas · 9h ago
I use gemini to create covers for songs/albums i make, with beautiful typography. Something like this [1]. I was dying of curiosity, how ideogram managed to create such gorgeous images. I figured it out 2 days ago.

I take an image with some desired colors or typography from an already existing music album or from Ideogram's poster section. I pass it to gemini and give the command:

"describe the texture of the picture, all the element and their position in the picture, left side, center right side, up and down, the color using rgb, the artistic style and the calligraphy or font of the letters"

Then i take the result and pass it through an LLM, a different LLM because i don't like gemini that much, i find it is much less coherent than other models. I use qwen-qwq-32b usually and I take the description gemini outputs and give it to qwen:

" write a similar description, but this time i want a surreal painting with several imaginative colors. Follow the example of image description, add several new and beautiful shapes of all elements and give all details, every side which brushstrokes it uses, and rgb colors it uses, the color palette of the elements of the page, i want it to be a pastel painting like the example, and don't put bioluminesence. I want it to be old style retro style mystery sci fi. Also i want to have a title of "Song Title" and describe the artistic font it uses and it's position in the painting, it should be designed as a drum n bass album cover "*

Then i take the result and give it back to gemini with command: "Create an image with text "Song Title" for an album cover: here is the description of the rest of the album"

If the resulting image is good, then it is time to add font, i take the new image description and pass it through qwen again, supposing the image description has fields Title and Typography:

"rewrite the description and add full description of the letters and font of text, clean or distressed, jagged or fluid letters or any other property they might have, where they are overlayed, and make some new patterns about the letter appearance and how big they are and the material they are made of, rewrite the Title and Typography."

I replace the previous description's section Title and Typography with the new description and create images with beautiful fonts.

[1] https://imgur.com/a/8TCUJ75

simonw · 12h ago
Posted some notes from trying this out here, including examples of the images it produced and a tool for rendering the JSON https://simonwillison.net/2025/May/7/gemini-images-preview/
qq99 · 16h ago
Wasn't this already available in AI Studio? It sounds like they also improved the image quality. It's hard to keep up with what's new with all these versions
taylorhughes · 16h ago
Image editing/compositing/remixing is not quite as good as gpt-image-1, but the results are really compelling anyway due to the dramatic increase in speed! Playing with it just now, it's often 5 seconds for a compositing task between multiple images. Feels totally different from waiting 30s+ for gpt-image-1.
refulgentis · 17h ago
Another release from Google!

Now I can use:

- Gemini 2.0 Flash Image Generation Preview (May) instead of Gemini 2.0 Flash Image Generation Preview (March)

- or when I need text, Gemini 2.5 Flash Thinking 04-17 Preview ("natively multimodal" w/o image generation)

- When I need to control thinking budgets, I can do that with Gemini 2.5 Flash Preview 04-17, with not-thinking at a 50% price increase over a month prior

- And when I need realtime, fallback to Gemini 2.0 Flash 001 Live Preview (announced as In Preview on April 9 2025 after the Multimodal Live API was announced as released on December 11 2024)

- I can't control Gemini 2.5 Pro Experimental/Preview/Preview IO Edition's thinking budgets, but good news follows in the next bullet: they'll swap the model out underneath me with one that thinks ~10x less so at least its in the same cost ballpark as their competitors

- and we all got autoupgraded from Gemini 2.5 Pro Preview (03/25 released 4/2) to Gemini 2.5 Pro Preview (IO Edition) yesterday! Yay!

justanotheratom · 17h ago
Yay! do you use your Gemini in Gemini App or AI Studio or Vertex AI?
refulgentis · 17h ago
I am Don Quixote, building a app that abstracts over models (i.e. allows user choice), while providing them a user-controlled set of tools, and allowing users to write their own "scripts", i.e. precanned dialogue / response steps to permit ex. building of search.

Which is probably what makes me so cranky here. It's very hard keeping track of all of it and doing my best to lever up the models that are behind Claude's agentic capabilities, and all the Newspeak of Google PR makes it consume almost as much energy as the rest of the providers combined. (I'm v frustrated that I didn't realize till yesterday that 2.0 Flash had quietly gone from 10 RPM to 'you can actually use it')

I'm a Xoogler and I get why this happens ("preview" is a magic wand that means "you don't have to get everyone in bureaucracy across DeepMind/Cloud/? to agree to get this done and fill out their damn launchcal"), but, man.

xnx · 17h ago
A matrix of models, capabilities, and prices would be really useful.
GaggiX · 18h ago
Not available in the EU, first version was and then removed.

Btw still not as good as ChatGPT but much, much faster, it's a nice progress compare to the previous model.

adverbly · 17h ago
Google totally crushing it and stock is down 8% today :|

Is it just me or is the market just absolutely terrible at understanding the implications and speed of progress behind what's happening right now in the walls of big G?

abirch · 17h ago
A potential reason that GOOG is down right now is that Apple is looking at AI Search Engines.

https://www.bloomberg.com/news/articles/2025-05-07/apple-wor...

Although AI is fun and great, an AI search engine may have issues of being unprofitable. It's similar to how 23 and Me got many customers selling a 500 dollar test to people for 100 dollars.

xnx · 17h ago
Would be quite a financial swing for Apple from getting paid billions of dollars by Google for search to having to spend billions of dollars to make their own.
abirch · 17h ago
From the article Eddy Cue is Apple’s senior vice president of services. "Cue said he believes that AI search providers, including OpenAI, Perplexity AI Inc. and Anthropic PBC, will eventually replace standard search engines like Alphabet’s Google. He said he believes Apple will bring those options to Safari in the future."

So Apple may not be making their own, but they won't be spending billions either. I'm wondering how the people will be able to monetize the searches so that they make money.

mattlondon · 16h ago
FWIW I searched this story not long after it broke and Google - yes the traditional "old school search engine" - had an AI-generated summary of the story with a breakdown of the whys and how's right there at the top of the page. This was basically real time given or take 10 minutes.

I am not sure why people think OpenAI et al are going to eat Google's lunch here. Seems like they're already doing AI-for-search and if there is anyone who can do it cheaply and at scale I bet on Google being the ones to do it (with all their data centers, data integrations/crawlers, and custom hardware and experience etc). I doubt some startup using the Bing-index and renting off-the-shelf Nvidia hardware using investor-funds is going to leapfrog Google-scale infrastructure and expertise.

resource_waste · 16h ago
Why would any of this have an impact on stock prices?

LLMs are insanely competitive and a dime a dozen now. Most professional uses can get away with local models.

This is image generation... Niche cases in another saturated market.

How are any of these supposed to make google billions of dollars?

lenerdenator · 17h ago
The market is absolutely terrible at a lot of things.
mvdtnz · 13h ago
I gave this a crack this morning, trying something very similar to the examples. I tried to get Gemini 2.0 Preview to add a set of bi-fold doors to a picture of a house in a particular place. It failed completely. It put them in the wrong place, they looked absolutely hideous (like I had pasted them in with MS Paint) and the more I tried to correct it with prompts the worse it got. At one point when I re-prompted it, it said

> Okay, I understand. You want me to replace ONLY the four windows located underneath the arched openings on the right side of the house with bifold doors, leaving all other features of the house unchanged. Here is the edited image:

Followed by no image. This is a behaviour I have seen many times from Gemini in the past so it's frustrating that it's still a problem.

I give this a 0/10 for my first use case.

jansan · 18h ago
Some examples are quite impressive, but the one with the ice bear on the white mug is very underwhelming and the co-drawing looks like it was hacked together by a vibe coder.
cyral · 7h ago
It looks like those horribly edited gift mugs I see on amazon occasionally, where someone just puts the image over the mug without accounting for the 3D shape. Too many variants to actually take the image. Would have been an excellent example to show how much better AI is if they made it do that.
thornewolf · 18h ago
The co-drawing is definitely not a fully fleshed-out product or anything but I think it is a great tech demo. What don't you like about it?