Create and edit images with Gemini 2.0 in preview

252 meetpateltech 100 5/7/2025, 4:06:44 PM developers.googleblog.com ↗

Comments (100)

vunderba · 65d ago

I've added/tested this multimodal Gemini 2.0 to my shoot-out of SOTA image gen models (OpenAI 4o, Midjourney 7, Flux, etc.) which contains a collection of increasingly difficult prompts.

https://genai-showdown.specr.net

I don't know how much of Google's original Imagen 3.0 is incorporated into this new model, but the overall aesthetic quality seems to be unfortunately significantly worse.

The big "wins" are:

- Multimodal aspect in trying to keep parity with OpenAI's offerings.

- An order of magnitude faster than OpenAI 4o image gen

ticulatedspline · 65d ago

Excellent site! OpenAI 4o is more than mildly frighting in it's capabilities to understand the prompt. Seems mostly what's holding it back is a tendency away from photo-realism (or even typical digital art styles) and it's own safeguards.

echelon · 65d ago

Multimodal is the only image generation modality that matters going forward. Flux, HiDream, Stable Diffusion, and the like are going to be relegated to the past once multimodal becomes more common. Text-to-image sucks, and image-to-image with all the ControlNets and Comfy nodes is cumbersome in comparison to true multimodal instructiveness.

I hope that we get an open weights multimodal image gen model. I'm slightly concerned that if these things take tens to hundreds of millions of dollars to train, that only Google and OpenAI will provide them.

That said, the one weakness in multimodal models is that they don't let you structure the outputs yet. Multimodal + ControlNets would fix that, and that would be like literally painting with the mind.

The future, when these models are deeply refined and perfected, is going to be wild.

zaptrem · 65d ago

Good chance a future llama will output image tokens

echelon · 65d ago

That's my hope. That Llama or Qwen bring multimodal image generation capabilities to open source so we're not left in the dark.

If that happens, then I'm sure we'll see slimmer multimodal models over the course of the next year or so. And that teams like Black Forest Labs will make more focused and performant multimodal variants.

We need the incredible instructivity of multimodality. That's without question. But we also need to be able to fine tune, use ControlNets to guide diffusion, and to compose these into workflows.

troupo · 65d ago

I also find it weird how it defaults/devolves into this overall brown-ish style. Once you see it, you see it everywhere

flir · 64d ago

I've played around with "create an image based on this image" chains quite a lot, and yep, everything goes brown with 4o. You append the images to each other as a filmstrip and it's almost like a gradient.

They also simplify over the generations (eg a basket full of stuff slowly loses the stuff), but I guess that's to be expected.

jlarcombe · 64d ago

yes. it's absolutely horrible looking.

avereveard · 65d ago

It's a bit expensive/slow but for styled request I let it do the base image and when happy with the composition I ask to remake it as a picture or in whatever style needed.

belter · 65d ago

Your shoot-out site is very useful. Could I suggest adding prompts that expose common failure modes?

For example, asking the models to show clocks set to a specific time or people drawing with their left hand. I think most, if not all models, will likely display every clock with the same time...And portray subjects drawing with their right hand.

vunderba · 65d ago

@belter / @crooked-v

Thanks for the suggestions. Most of the current prompts are a result of personal images that I wanted to generate - so I'll try to add some "classic GenAI failure modes". Musical instruments such as pianos also used to be a pretty big failure point as well.

troupo · 65d ago

For personal images I often play with wooly mammoths, and most models are incapable of generating anything but textbook images. Any deviation either becomes an elephant or an abomination (bull- or bear-like monsters)

crooked-v · 65d ago

Another I would suggest is buildings with specific unusual proportions and details(e.g. "the mansion's west wing is twice the height of the right wing and has only very wide windows"). I've yet to find a model that will do that kind of thing reliably, where it seems to just fall back on the vibes of whatever painting or book cover is vaguely similar to what's described.

droopyEyelids · 65d ago

generating a simple maze for kids is also not possible yet

vunderba · 65d ago

Love this one so I've added it. The concept is very easy for most GenAI models to grasp, but it requires a strong overall cohesive understanding. Rather unbelievably, OpenAI 4o managed to produce a pass.

I should also add an image that is heavy with "greebles". GenAI usually lacks the fidelity for these kinds of minor details so although it adds them - they tend to fall apart at more than a cursory examination.

https://en.wikipedia.org/wiki/Greeble

saretup · 64d ago

What I found while using these models:

For generating a new image, GPT 4o image gen is the best.

For editing an existing image (while retaining parts of the original image) such as adding text or objects in the original image, Gemini 2.0 image gen model is the best (GPT 4o always changes the original image no matter what).

esperent · 65d ago

Your site is really useful, thanks for sharing. One issue is that the list of examples sticks to the top and covers more than half of the screen on mobile, could you add a way to hide it?

If you're looking for other suggestions a summary table showing which models are ahead would be great.

vunderba · 65d ago

Great point - when I started building it I think I only had about four test cases, but now the nav bar is eating 50% of the vertical display so I've removed it from mobile display!

Wrt to the summary table, did you have a different metric in mind? The top of the display should already be showing a "Model Performance" chart with OpenAI 4 and Google Imagen 3 leading the pack.

esperent · 64d ago

That's much easy to read now.

> The top of the display should already be showing a "Model Performance" chart

I guess I missed this earlier!

pkulak · 65d ago

> That mermaid was quite the saucy tart.

Really now?

croes · 64d ago

How about including the simple cases where AI usually fails like

"Draw a clock showing the time of 09:30 a.m."

ChatGPT still shows 01:50

"Draw a painter painting a picture of the Eiffel Tower with his left hand"

Painter is still right handed.

liuliu · 65d ago

Do you mind to share which HiDream-I1 model you are using? I am getting better results with these prompts from mine implementation inside Draw Things.

vunderba · 65d ago

Sure - I was using "hidream-l1-dev" but if you're seeing better results - I might rerun the hidream tests with the "hidream-i1-full" model.

I've been thinking about possibly rerunning the Flux Dev prompts using the 1.1 Pro but I liked having a base reference for images that can be generated on consumer hardware.

liuliu · 64d ago

Yeah, I use the full model which is slightly better at some of these prompts.

andybak · 64d ago

Any thoughts on how Ideogram would rank? I've not used it recently but I used to get the sense that it is (or was) a "contender".

simonw · 65d ago

Be a bit careful playing with this one. I tried this:

  curl -s -X POST \
    "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key=$GEMINI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Provide a vegetarian recipe for butter chicken but with chickpeas not chicken and include many inline illustrations along the way"}
        ]
      }],
      "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
    }' > /tmp/out.json

And got back 41MB of JSON with 28 base64 images in it: https://gist.github.com/simonw/55894032b2c60b35f320b6a166ded...

At 4c per image that's more than a dollar on that single prompt.

I built this quick tool https://tools.simonwillison.net/gemini-image-json for pasting that JSON into to see it rendered.

weird-eye-issue · 65d ago

I mean you did ask for "many illustrations"

eminence32 · 65d ago

This seems neat, I guess. But whenever I try tools like this, I often run into the limits of what I can describe in words. I might try something like "Add some clutter to the desk, including stacks of paper and notebooks" but when it doesn't quite look like what I want, I'm not sure what else to do except try slightly different wordings until the output happens to land on what I want.

I'm sure part of this is a lack of imagination on my part about how to describe the vague image in my own head. But I guess I have a lot of doubts about using a conversational interface for this kind of stuff

monster_truck · 65d ago

Chucking images at any model that supports image input and asking it to describe specific areas/things 'in extreme detail' is a decent way to get an idea of what its expecting vs what you want.

thornewolf · 65d ago

+1 to this flow. I use the exact same phrase "in extreme detail" as well haha. Additionally, I ask the model to describe what prompt it might write to produce some edit itself.

crooked-v · 65d ago

I just tried a couple of cases that ChatGPT is bad at (reproducing certain scenes/setpieces from classic tabletop RPG adventures, like the weird pyramid from classic D&D B4 The Lost City), and Gemini fails in just about the same way of getting architectural proportions and scenery details wrong even when given simple, broad rules about them. Adding more detail seems kind of pointless when it can't even get basics like "creature X is about as tall as the building around it" or "the pyramid is surrounded by ruined buildings" right.

BoorishBears · 65d ago

What's an example of a prompt you tried and it failed on?

qoez · 65d ago

Maybe that's how the future will unfold. There will be subtle things AI fails to learn, and there will be differences in skills in how good people are at making AI do things, which will be a new skill in itself and will end up being determining difference in pay in the future.

gowld · 65d ago

This is "Prompt Engineering"

xbmcuser · 65d ago

ask Gemini to word your thoughts better then use those to do the image editing.

metalrain · 65d ago

Exactly, describing more complex compositions, lighting, image enchancements/filters there is so many things you know how it looks but to describe it such that LLM gets it and will reproduce it is pretty difficult.

Sometimes sketching it could be helpful, but more abstract technical thing like LUTs, feels still out of reach.

betterThanTexas · 65d ago

> I'm sure part of this is a lack of imagination on my part about how to describe the vague image in my own head.

This is more related to our ability to articulate than is easy to demonstrate, in my experience. I can certainly produce images in my head I have difficulty reproducing well and consistently via linguistic description.

SketchySeaBeast · 65d ago

It's almost as if being able to create art accurate to our mental vision requires practice and skill, be it the ability to create an image or to write it and invoke an imagine in others.

betterThanTexas · 65d ago

Absolutely! But this was surprising to me—my intuition says if I can firmly visualize something, I should be able to describe it. I think many people have this assumption and it's responsible for a lot of projection in our social lives.

SketchySeaBeast · 65d ago

Yeah, it's probably a good argument for having people try some form of art, to have them understand that their intent and their outcome is rarely the same.

Nevermark · 65d ago

Perhaps describe the types and styles of work associated with the desk, to create a coherent character to the clutter

bufferoverflow · 65d ago

In that scenario, if you can't describe what you want with words, a human designer can't read your mind either.

Hasnep · 65d ago

No, but a good designer will be able to help you put what you want into words.

gowld · 65d ago

Ask the AI to help you put what you want into words.

maksimur · 64d ago

I think the issue with AI (in contrast to human interaction) is its lack of real-time responsiveness. This slower back-and-forth can lead to frustration, especially if it takes a dozen or more messages to get the point across. Humans are also helped in helping you by contextual cues like gestures, facial expressions or "shared qualia".

zoogeny · 65d ago

I would politely suggest you work at getting better at this since it would be a pretty important skill in a world where a lot of creative work is done by AI.

As some have mentioned, LLMs are treasure troves of information for learning how to prompt the LLM. One thing to get over is a fear of embarrassment in what you say to the LLM. Just write a stream of consciousness to the LLM about what you want and ask it to generate a prompt based on that. "I have an image that I am trying to get an image LLM to add some clutter to. But when I ask it to do it, like I say add some stack of paper and notebooks, but it doesn't look like I want because they are neat stacks of paper. What I want is a desk that kind of looks like it has been worked at for a while by a typical office worker, like at the end of the day with a half empty coffee cup and .... ". Just ramble away and then ask the LLM to give you the best prompt. And if it doesn't work, literally go back to the same message chain and say "I tried that prompt and it was [better|worse] than before because ...".

This is one of those opportunities where life is giving you an option: give up or learn. Choose wisely.

refulgentis · 65d ago

Another release from Google!

Now I can use:

- Gemini 2.0 Flash Image Generation Preview (May) instead of Gemini 2.0 Flash Image Generation Preview (March)

- or when I need text, Gemini 2.5 Flash Thinking 04-17 Preview ("natively multimodal" w/o image generation)

- When I need to control thinking budgets, I can do that with Gemini 2.5 Flash Preview 04-17, with not-thinking at a 50% price increase over a month prior

- And when I need realtime, fallback to Gemini 2.0 Flash 001 Live Preview (announced as In Preview on April 9 2025 after the Multimodal Live API was announced as released on December 11 2024)

- I can't control Gemini 2.5 Pro Experimental/Preview/Preview IO Edition's thinking budgets, but good news follows in the next bullet: they'll swap the model out underneath me with one that thinks ~10x less so at least its in the same cost ballpark as their competitors

- and we all got autoupgraded from Gemini 2.5 Pro Preview (03/25 released 4/2) to Gemini 2.5 Pro Preview (IO Edition) yesterday! Yay!

justanotheratom · 65d ago

Yay! do you use your Gemini in Gemini App or AI Studio or Vertex AI?

refulgentis · 65d ago

I am Don Quixote, building a app that abstracts over models (i.e. allows user choice), while providing them a user-controlled set of tools, and allowing users to write their own "scripts", i.e. precanned dialogue / response steps to permit ex. building of search.

Which is probably what makes me so cranky here. It's very hard keeping track of all of it and doing my best to lever up the models that are behind Claude's agentic capabilities, and all the Newspeak of Google PR makes it consume almost as much energy as the rest of the providers combined. (I'm v frustrated that I didn't realize till yesterday that 2.0 Flash had quietly gone from 10 RPM to 'you can actually use it')

I'm a Xoogler and I get why this happens ("preview" is a magic wand that means "you don't have to get everyone in bureaucracy across DeepMind/Cloud/? to agree to get this done and fill out their damn launchcal"), but, man.

xnx · 65d ago

A matrix of models, capabilities, and prices would be really useful.

cush · 65d ago

The doodle demo is super fun

https://aistudio.google.com/apps/bundled/gemini-co-drawing?s...

No comments yet

minimaxir · 65d ago

Of note is that the per-image pricing for Gemini 2.0 image generation is $0.039 per image, which is more expensive than Imagen 3 ($0.03 per image): https://ai.google.dev/gemini-api/docs/pricing

The main difference is that Gemini does allow for incorporating a conversation to generate the image as demoed here, while Imagen 3 is a strict text-in/image-out with optional mask-constrained edits but likely allows for higher-quality images overall if skilled with prompt engineering. This is a nuance that is annoying to differentiate.

vunderba · 65d ago

Anecdotal but from preliminary sandbox testing side-by-side with Gemini 2.0 Flash and Imagen 3.0 - it definitely appears that that is the case - higher overall visual quality from Imagen 3.

ipsum2 · 65d ago

> likely allows for higher-quality images overall

What makes you say that?

mkl · 65d ago

> what the lamp from the second image would look like on the desk from the first image

The lamp is put on a different desk in a totally different room, with AI mush in the foreground. Props for not cherry-picking a first example, I guess. The sofa colour one is somehow much better, with a less specific instruction.

PurpleRamen · 64d ago

It looks like the same table, but some of the legs are missing. Parts of the lamp are also missing. And the scaling of it looks really wrong.

What a great, awful world we will have if people really start making decisions based on these results. I'm curious if in some years we will have people who fancy AI-trash chic seriously..

cyral · 65d ago

That one is an odd example.. especially since image #3 does a similar task with excellent accuracy in keeping the old image intact. I've had the same issues when trying to make it visualize adding decor, it ends up changing the whole room or furniture materials.

GaggiX · 65d ago

Not available in the EU, first version was and then removed.

Btw still not as good as ChatGPT but much, much faster, it's a nice progress compare to the previous model.

thornewolf · 65d ago

Model outputs look good-ish. I think they are neat. I updated my recent hack project https://lifestyle.photo to the new model. It's middling-to-good.

There are a lot of failure modes still but what I want is a very large cookbook showing what known-good workflows are. Since this is just so directly downstream of (limited) training data it might be that I am just prompting in a ever so slightly bad way.

sigmaisaletter · 65d ago

Re your project: I'd expect at least the demo to not have an obvious flaw. The "lifestyle" version of your bag has a handle that is nearly twice as long as the "product" version.

thornewolf · 65d ago

This is a fair critique. While I am merely a "LLM wrapper", I should put the product's best foot forward and pay more attention to my showcase examples.

nico · 65d ago

Love your project, great application of gen AI, very straightforward value proposition, excellent and clear messaging

Very well done!

thornewolf · 65d ago

Thank you for the kind words! I am looking forward to creating a Show HN next week alongside a Product Hunt announcement. I appreciate any and all feedback. You can provide it through the website directly or through the email I have attached in my bio.

mNovak · 65d ago

I'm getting mixed results with the co-drawing demo, in terms of understanding what stick figures are, which seems pretty important for the 99% of us who can't draw a realistic human. I was hoping to sketch a scene, and let the model "inflate" it, but I ended up with 3D rendered stick figures.

Seems to help if you explicitly describe the scene, but then the drawing-along aspect seem relatively pointless.

Yiling-J · 65d ago

I generated 100 recipes with images using gemini-2.0-flash and gemini-2.0-flash-exp-image-generation as a demo of text+image generation in my open-source project: https://github.com/Yiling-J/tablepilot/tree/main/examples/10...

You can see the full table with images here: https://tabulator-ai.notion.site/1df2066c65b580e9ad76dbd12ae...

I think the results came out quiet well. Be aware I don't generate a text prompt based on row data for image generation. Instead, the raw row data(ingredients, instructions...) and table metadata(column names and descriptions) are sent directly to gemini-2.0-flash-exp-image-generation.

mvdtnz · 65d ago

I gave this a crack this morning, trying something very similar to the examples. I tried to get Gemini 2.0 Preview to add a set of bi-fold doors to a picture of a house in a particular place. It failed completely. It put them in the wrong place, they looked absolutely hideous (like I had pasted them in with MS Paint) and the more I tried to correct it with prompts the worse it got. At one point when I re-prompted it, it said

> Okay, I understand. You want me to replace ONLY the four windows located underneath the arched openings on the right side of the house with bifold doors, leaving all other features of the house unchanged. Here is the edited image:

Followed by no image. This is a behaviour I have seen many times from Gemini in the past so it's frustrating that it's still a problem.

I give this a 0/10 for my first use case.

ohadron · 65d ago

For one thing, it's way faster than the OpenAI equivalent in a way that might unlock additional use cases.

freedomben · 65d ago

Speed has been the consistent thing I've noticed with Gemini too, even going back to the earlier days when Gemini was a bit of a laughing stock. Gemini is fast

julianeon · 65d ago

I don't know exactly the speed/quality tradeoff but I'll tell you this: Google may be erring too much on the speed side. It's fast but junk. I suspect a lot of people try it then bounce off back to Midjourney, like I did.

pentagrama · 65d ago

I want to take a step back and reflect on what this actually shows us. Look at the examples Google provides: it refers to the generated objects as "products", clearly pointing toward shopping or e-commerce use cases.

It seems like the real goal here, for Google and other AI companies, is a world flooded with endless AI-generated variants of objects that don’t even exist yet, crafted to be sold and marketed (probably by AI too) to hyper-targeted audiences. This feels like an incoming wave of "AI slop", mass-produced synthetic content, crashing against the small island of genuine human craftsmanship and real, existing objects.

hapticmonkey · 64d ago

It's sort of sad how these tools went from "godlike new era of human civilization" to "some commodity tools for marketing teams to sell stuff".

I get that they are trying to find some practical used cases for their tools. But there's no enlightenment in the product development here.

If this is already the part of the s-curve where these AI tools get diminishing returns...what a waste of everybody's time.

nly · 65d ago

Recently I've been seeing a lot of holiday lets on sites like Rightmove (UK) and Airbnb with clearly AI generated 'enhancements' to the photos.

It should be illegal in my view.

vunderba · 65d ago

Yeah - and honestly I don't really get this. Using GenAI for real-world products seems like a recipe for a slew of incoming fraudulent advertising lawsuits if the images are slightly different from the actual physical products yet presented as if they are actual real photographs.

nkozyra · 65d ago

The gating factor here is the pool of consumers. Once people have slop exhaustion there's nobody to sell this to.

Maybe this is why all of the future AI fiction has people dressed in the same bland clothing.

egamirorrim · 65d ago

I don't understand how to use this, I keep trying to edit a photo (change a jacket to a t-shirt) of myself in the Gemini app with 2.0 flash selected and it just generated a new image that's nothing like the original

FergusArgyll · 65d ago

I think this is just in AI Studio. In the Gemini app I think it goes: Flash describes the image to imagen -> imagen generates a new image

thornewolf · 65d ago

It is very sensitive to your input prompts. Minor differences will result in drastic quality differences.

julianeon · 65d ago

Remember you are paying about 4 cents an image if I'm understanding the pricing correctly.

qq99 · 65d ago

Wasn't this already available in AI Studio? It sounds like they also improved the image quality. It's hard to keep up with what's new with all these versions

simonw · 65d ago

Posted some notes from trying this out here, including examples of the images it produced and a tool for rendering the JSON https://simonwillison.net/2025/May/7/gemini-images-preview/

taylorhughes · 65d ago

Image editing/compositing/remixing is not quite as good as gpt-image-1, but the results are really compelling anyway due to the dramatic increase in speed! Playing with it just now, it's often 5 seconds for a compositing task between multiple images. Feels totally different from waiting 30s+ for gpt-image-1.

voidUpdate · 64d ago

1 doesn't actually really show how the lamp would look in that situation... in the first image it's about the same height as the sofa. I'd expect it to be at least twice the size that it is in the second image. Also what is going on underneath the table?

mastazi · 64d ago

LLMs are notoriously bad at estimating or comparing the size of physical objects due to the fact that their training happens away from the physical world, it is a well known problem and recently it has been discussed even in popular media e.g. https://theconversation.com/why-ai-cant-ever-reach-its-full-...

voidUpdate · 64d ago

Probably not the best idea to try and use that as a demo of how awesome your new image generator is then

yots · 64d ago

That table image is... horribly honest? I'm a bit shocked they used it in a blog post. It's really, really bad

Tsarp · 65d ago

There are direct prompt tests and then there are tests with tooling.

If for example you use controlnets you can pretty much get very close to a style composition that you need with an open model like Flux that will be far better. Flux has a few successors coming up now

emporas · 65d ago

I use gemini to create covers for songs/albums i make, with beautiful typography. Something like this [1]. I was dying of curiosity, how ideogram managed to create such gorgeous images. I figured it out 2 days ago.

I take an image with some desired colors or typography from an already existing music album or from Ideogram's poster section. I pass it to gemini and give the command:

"describe the texture of the picture, all the element and their position in the picture, left side, center right side, up and down, the color using rgb, the artistic style and the calligraphy or font of the letters"

Then i take the result and pass it through an LLM, a different LLM because i don't like gemini that much, i find it is much less coherent than other models. I use qwen-qwq-32b usually and I take the description gemini outputs and give it to qwen:

" write a similar description, but this time i want a surreal painting with several imaginative colors. Follow the example of image description, add several new and beautiful shapes of all elements and give all details, every side which brushstrokes it uses, and rgb colors it uses, the color palette of the elements of the page, i want it to be a pastel painting like the example, and don't put bioluminesence. I want it to be old style retro style mystery sci fi. Also i want to have a title of "Song Title" and describe the artistic font it uses and it's position in the painting, it should be designed as a drum n bass album cover "*

Then i take the result and give it back to gemini with command: "Create an image with text "Song Title" for an album cover: here is the description of the rest of the album"

If the resulting image is good, then it is time to add font, i take the new image description and pass it through qwen again, supposing the image description has fields Title and Typography:

"rewrite the description and add full description of the letters and font of text, clean or distressed, jagged or fluid letters or any other property they might have, where they are overlayed, and make some new patterns about the letter appearance and how big they are and the material they are made of, rewrite the Title and Typography."

I replace the previous description's section Title and Typography with the new description and create images with beautiful fonts.

[1] https://imgur.com/a/8TCUJ75

reneherse · 64d ago

Thanks for sharing your process. That example is some of the best gen art I've seen.

emporas · 64d ago

Thanks. This workflow of 4 prompts, has the benefit of not using the mouse.

I have a friend who uses Photoshop to make posters for bands, the resulting images are better, faces of real people are put in the poster, but he does 1 million clicks every time. I use only Emacs to make the image, much faster workflow, more relaxing, i just edit text most of the time.

Gemini's image generation abilities, especially regarding typography, are in the same ballpark as Ideogram. Ideogram is a little bit better sometimes, vertical text for example trips up Gemini, but Gemini being native multimodal, works very well with text descriptions of images.

Ideogram has an upper limit to the total number of tokens it can accept as a text input. It is not native multimodal as far as i know.

jansan · 65d ago

Some examples are quite impressive, but the one with the ice bear on the white mug is very underwhelming and the co-drawing looks like it was hacked together by a vibe coder.

cyral · 65d ago

It looks like those horribly edited gift mugs I see on amazon occasionally, where someone just puts the image over the mug without accounting for the 3D shape. Too many variants to actually take the image. Would have been an excellent example to show how much better AI is if they made it do that.

thornewolf · 65d ago

The co-drawing is definitely not a fully fleshed-out product or anything but I think it is a great tech demo. What don't you like about it?

adverbly · 65d ago

Google totally crushing it and stock is down 8% today :|

Is it just me or is the market just absolutely terrible at understanding the implications and speed of progress behind what's happening right now in the walls of big G?

abirch · 65d ago

A potential reason that GOOG is down right now is that Apple is looking at AI Search Engines.

https://www.bloomberg.com/news/articles/2025-05-07/apple-wor...

Although AI is fun and great, an AI search engine may have issues of being unprofitable. It's similar to how 23 and Me got many customers selling a 500 dollar test to people for 100 dollars.

xnx · 65d ago

Would be quite a financial swing for Apple from getting paid billions of dollars by Google for search to having to spend billions of dollars to make their own.

abirch · 65d ago

From the article Eddy Cue is Apple’s senior vice president of services. "Cue said he believes that AI search providers, including OpenAI, Perplexity AI Inc. and Anthropic PBC, will eventually replace standard search engines like Alphabet’s Google. He said he believes Apple will bring those options to Safari in the future."

So Apple may not be making their own, but they won't be spending billions either. I'm wondering how the people will be able to monetize the searches so that they make money.

mattlondon · 65d ago

FWIW I searched this story not long after it broke and Google - yes the traditional "old school search engine" - had an AI-generated summary of the story with a breakdown of the whys and how's right there at the top of the page. This was basically real time given or take 10 minutes.

I am not sure why people think OpenAI et al are going to eat Google's lunch here. Seems like they're already doing AI-for-search and if there is anyone who can do it cheaply and at scale I bet on Google being the ones to do it (with all their data centers, data integrations/crawlers, and custom hardware and experience etc). I doubt some startup using the Bing-index and renting off-the-shelf Nvidia hardware using investor-funds is going to leapfrog Google-scale infrastructure and expertise.

resource_waste · 65d ago

Why would any of this have an impact on stock prices?

LLMs are insanely competitive and a dime a dozen now. Most professional uses can get away with local models.

This is image generation... Niche cases in another saturated market.

How are any of these supposed to make google billions of dollars?

lenerdenator · 65d ago

The market is absolutely terrible at a lot of things.

cthulberg · 64d ago

gemini-2.0-flash-*-image-generation models are not currently supported in a number of countries in Europe, Middle East & Africa

source: https://ai.google.dev/gemini-api/docs/models#gemini-2.0-flas... and my Google Ai Studio

Show HN: RULER – Easily apply RL to any agent (openpipe.ai)

Show HN: Transition – AI Triathlon Coach (transition.fun)

Show HN: VibeKin – Gated Discord Tribes via Personality Matching (tgc.fly.dev)

Show HN: Interactive pinout for the Raspberry Pi Pico 2 (pico2.pinout.xyz)

Show HN: Vibe Kanban – Kanban board to manage your AI coding agents (github.com)

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels (github.com)

Show HN: Open source alternative to Perplexity Comet (browseros.com)

Show HN: Cactus – Ollama for Smartphones (github.com)

Show HN: CXXStateTree – A modern C++ library for hierarchical state machines (github.com)

Show HN: I built a playground to showcase what Flux Kontext is good at (fluxkontextlab.com)

Show HN: OffChess – Offline chess puzzles app (offchess.com)

Show HN: Director – Local first, open source MCP Gateway

Show HN: FlopperZiro – A DIY open-source Flipper Zero clone (github.com)

Show HN: MCP server for searching and downloading documents from Anna's Archive (github.com)

Show HN: Typeform was too expensive so I built my own forms (ikiform.com)

Show HN: asyncmcp – Run MCP over async transport via AWS SNS+SQS (github.com)

Show HN: An Improvisational Web Server (github.com)

Show HN: BreakerMachines – Modern Circuit Breaker for Rails with Async Support (github.com)

Show HN: Helices Create a New Model of Deterministic Computation [pdf] (lambdalord.github.io)

Show HN: Petrichor – a free, open-source, offline music player for macOS (github.com)

Show HN: Multiple barcodes can be generated on single page (ddddddo.github.io)

Show HN: NodeLoop – Hub for electronics design knowledge and tools (nodeloop.org)

Show HN: NYC Subway Simulator and Route Designer (buildmytransit.nyc)

Show HN: TUI personal monthly budget planner (github.com)

Show HN: Ten years of running every day, visualized (nodaysoff.run)

Show HN: A decentralized command line key-value store on Nostr (github.com)

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics (alpha.lisagui.com)

Show HN: AI Movie Finder – I created a way to find movies by describing (aimoviefinder.com)

Show HN: Jukebox – Free, Open Source Group Playlist with Fair Queueing (jukeboxhq.com)

Show HN: Code is all you need – Sherlog MCP (github.com)

Show HN: I rewrote an outdated React Native map clustering library (github.com)

Show HN: Virby, a vfkit-based Linux builder for Nix-Darwin (github.com)

Show HN: A rain Pomodoro with brown noise, ASMR, and Middle Eastern music (forgetoolz.com)

Show HN: From Photos to Positions: Prototyping VLM-Based Indoor Maps (arjo129.github.io)

Show HN: Modernized file manager and program manager from Windows 3.x (github.com)

Show HN: I just deployed GovDocs – which use AI to make SA gov docs searchable (govdocs.co.za)

Show HN: Ossia score – A sequencer for audio-visual artists (github.com)

Show HN: Piano Trainer – Learn piano scales, chords and more using MIDI (github.com)

Show HN: Unlearning Comparator, a visual tool to compare machine unlearning (gnueaj.github.io)

Show HN: Stravu – Editable, multi-player AI notebooks with text, tables, diagram

Show HN: Cursor Rules Generator (cursor-rules-generator.xyz)

Show HN: Activiews – A privacy-first fitness alternative for Apple users (activiews.xyz)

Show HN: Pyhoff – Connect Python ML Models to Beckhoff/WAGO IO Hardware (github.com)

Show HN: A Language Server Implementation for SystemD Unit Files (github.com)

Show HN: I built a tool to solve window management (aboveaverageuser.com)

Show HN: I built an AI transcription app because my gf needed one for uni

Show HN: Endorphin AI–Run browser E2E tests from plain English with QA AI agent (endorphinai.dev)

Show HN: A Chrome Extension to Reveal SaaS Sprawl, Shadow IT, and Waste (hapstack.com)

Show HN: Pg-when– psql extension for creating time values with natural language (github.com)

Show HN: AirBending – Hand gesture based macOS app MIDI controller (nanassound.com)

Create and edit images with Gemini 2.0 in preview

Comments (100)