Stop squashing your commits. You're squashing your AI too
2 points by jannesblobel 7h ago 7 comments
Ask HN: Best codebases to study to learn software design?
100 points by pixelworm 2d ago 89 comments
Gemini 2.5 Flash Image
474 meetpateltech 248 8/26/2025, 2:01:46 PM developers.googleblog.com ↗
Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111
Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.
There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.
No it's not.
We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".
Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.
Flux Kontext and Qwen are also possible to fine tune and run locally.
Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.
We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.
It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.
Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.
My test is going to https://unsplash.com/s/photos/random and pick two random images, send them both and "integrate the subject from the second image into the first image" as the prompt. I think Gemini 2.5 is doing far better than ChatGPT (admittedly ChatGPT was the trailblazer on this path). FluxKontext seems unable to do that at all. Not sure if I were using it wrong, but it always only considers one image at a time for me.
Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.
https://genai-showdown.specr.net
This model gets 8 of the 12 prompts correct and easily comes within striking distance of the best-in-class models Imagen and gpt-image-1 and is a significant upgrade over the old Gemini Flash 2.0 model. The reigning champ, gpt-image-1, only manages to edge out Flash 2.5 on the maze and 9-pointed star.
What's honestly most astonishing to me is how long gpt-image-1 has remained at the top of the class - closing in on half a year which is basically a lifetime in this field. Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.
Comparison of gpt-image-1, flash, and imagen.
https://genai-showdown.specr.net?models=OPENAI_4O%2CIMAGEN_4...
Came into this thread looking for this post. It's a great way to compare prompt adherence across models. Have you considered adding editing capabilities in a similar way given the recent trend of inpainting-style prompting?
https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
So when generating a video of someone playing a keyboard the model would incorporate the idea of repeating groups of 8 tones, which is a fixed ideational aspect which might not be strongly represented in words adjacent to "piano".
It seems like models need help with knowing what should be static, or homomorphic, across or within images associated with the same word vectors and that words alone don't provide a strong enough basis [*1] for this.
*1 - it's so hard to find non-conflicting words, obviously I don't mean basis as in basis vectors, though there is some weak analogy.
for instance:
https://aistudio.google.com/app/prompts/1gTG-D92MyzSKaKUeBu2...
I wonder if the bot is forced to generate something new— certainly for a prompt like that it would be acceptable to just pick the first result off a google image search and be like "there, there's your picture of a piano keyboard".
https://imgur.com/a/fyX42my
https://imgur.com/a/H9gH3Zy
I think we will eventually have AI based tools that are just doing what a skilled human user would do in Photoshop, via tool-use. This would make sense to me. But just having AI generate a new image with imagined details just seems like waste of time.
In my eyes, one specific example they show (“Prompt: Restore photo”) deeply AI-ifies the woman’s face. Sure it’ll improve over time of course.
This is the first image I tried:
https://i.imgur.com/MXgthty.jpeg (before)
https://i.imgur.com/Y5lGcnx.png (after)
Sure, I could manually correct that quite easily and would do a better job, but that image is not important to us, it would just be nicer to have it than not.
I'll probably wait for the next version of this model before committing to doing it, but its exciting that we're almost there.
I've been waiting for that, too. But I'm also not interesting in feeding my entire extended family's visual history into Google for it to monetize. It's wrong for me to violate their privacy that way, and also creepy to me.
Am I correct to worry that any pictures I send into this system will be used for "training?" Is my concern overblown, or should I keep waiting for AI on local hardware to get better?
It does look like I'm using the new model, though. I'm getting image editing results that are well beyond what the old stuff was capable of.
This is in stark contrast to ChatGPT, where an edit prompt typically yields both requested and unrequested changes to the image; here it seems to be neither.
It was in the endless list of new shiny 'skills' that feels good to have. Now I can use nano-banana instead. Other models will soon follow, I am sure.
If anything, knowing Photoshop (I use Affinity Designer/Photo these days) is actually incredibly useful to finesse the output produced by AI. No regrets.
Engineering probably takes a while (5 years? 10 years?) because errors multiply and technical debt stacks up.
In images, that's not so much of a big deal. You can re-roll. The context and consequences are small. In programs, bad code leads to an unmaintainable mess and you're stuck with it.
But eventually this will catch up with us too.
If you think that these tools don't automate most existing graphics design work, you're gravely mistaken.
The question is whether this increases the amount of work to be done because more people suddenly need these skills. I'm of the opinion that this does in fact increase demand. Suddenly your mom and pop plumbing business will want Hollywood level VFX for their ads, and that's just the start.
I have to say while I'm deeply impressed by these text to image models, there's a part of me that's also wary of their impact. Just look at the comments beneath the average Facebook post.
It survives a lot of transformation like compression, cropping, and resizing. It even survives over alterations like color filtering and overpainting.
Now is that so bad?
This happened as I was genuinely searching for the actual live stream of SpaceX.
I am ashamed, even more so because I even posted the live stream link on Hacker News (!). Fortunately it was flagged early and I apologized personally to dang.
This was a terrible experience for me, on many levels. I never thought I would fall in such a trap, being very aware of the tech, reading about similar stories etc.
I remember being on a machining workshop and he was telling such an obvious things. Obvious things are obvious until they aren't, and then somebody gets hurt.
The point of my message was to "tell hn: it could happen to people in this community".
You sent your wallet to the real Elon and he used it as he saw fit. ;)
What about the bio is satirical? I'm pretty sure that's sincere too.
Because if this is real, then the world is cooked
if not, then the fact that I think that It might be real but the only reason I believe its a joke is because you are on hackernews so I think that either you are joking or the tech has gotten so convincing that even people on hackernews (which I hold to a fair standard) are getting scammed.
I have a lot of questions if true and I am sorry for your loss if that's true and this isn't satire but I'd love it if you could tell me if its a satirical joke or not.
[0]: https://www.ncsc.admin.ch/ncsc/en/home/aktuell/im-fokus/2023...
It doesn't make that much sense idk
Granted I played Runescape and EvE as a kid, so any double-isk scams are immediate redflags.
For some reason, my mind confused runescape with neopets from the odd1sout video which I think is a good watch.
Scams That Should be Illegal : https://www.youtube.com/watch?v=XyoBNHqah30
Edit: But of course Elon would call someone he knows rather than a stranger, rich people know a lot of people so of course they would never contact you about this.
https://en.wikipedia.org/wiki/Advance-fee_scam
Parent’s story is very believable, even if parent made this particular story up (which I personally don‘t think is the case) this has probably happened to somebody.
If they aren't joking, I apologize.
Also he's a troll so...
Its been a while, but I remember seeing streams for Elon offering to "double your bitcoin" and the reasoning was he wanted to increase the adoption and load test the network. Just send some bitcoin to some address and he will send it back double!
But the thing was it was on youtube. Hosted on an imposter Tesla page. The stream had been going on for hours and had over ten thousand people watching live. If you searched "Elon Musk Bitcoin" During the stream on Google, Google actually pushed that video as the first result.
Say what you want about the victims of the scam, but I think it should be pretty easy for youtube or other streaming companies to have a simple rule to simply filter all live streams with Elon Musk + (Crypto|BTC|etc) in the title and be able to filter all youtube pages with "Tesla" "SpaceX" etc in the title.
It's not like they're poor or struggling.
Am I missing something?
am i getting scammed by a billionare or an AI billionaire?
It seems like money naturally flows from the gullible to the Machiavellian.
For people like me that don’t know what nano-banana is.
I thought Medium was a stuck up blogging platform. Other than for paid subscriptions, why would they pay bloggers? Are they trying to become the next HuffPost or something?
"Banana" would be a nice name for their AI, and they could freely claim it's bananas.
Most of my photos these days are 48MP and I don't want to lose a ton of resolution just to edit them.
Here's a comparison of Flux Dev, MJ, Imagen, and Flash 2.5.
https://genai-showdown.specr.net/?models=FLUX_1D%2CMIDJOURNE...
That being said, if image fidelity is absolutely paramount and/or your prompts are relatively simple - Midjourney can still be fun to experiment with particularly if you crank up the weirdness / chaos parameters.
// In this one, Gemini doesn't understand what "cinematic" is
"A cinematic underwater shot of a turtle gracefully swimming in crystal-clear water [...]"
// In this one, the reflection in the water in the background has different buildings
"A modern city where raindrops fall upward into the clouds instead of down, pedestrians calmly walking [...]"
Midjourney created both perfectly.
Editing models do not excel at aesthetic, but they can take your Midjourney image, adjust the composition, and make it perfect.
These types of models are the Adobe killer.
Midjourney wins on aesthetic for sure. Nothing else comes close. Midjourney images are just beautiful to behold.
David's ambition is to beat Google to building a world model you can play games in. He views the image and video business as a temporary intermediate to that end game.
Without going into detail, basically the task boils down to, "generate exactly image 1, but replace object A with the object depicted in image 2."
Where image 2 is some front-facing generic version, ideally I want the model to place this object perfectly in the scene, replacing the existing object, that I have identified ideally exactly by being able to specify its position, but otherwise by just being able to describe very well what to do.
For models that can't accept multiple images, I've tried a variation where I put a blue box around the object that I want to replace, and paste the object that I want it to put there at the bottom of the image on its own.
I've tried some older models, and ChatGPT, also qwen-image last week, and just now, this one. They all fail at it. To be fair, this model got pretty damn close, it replaced the wrong object in the scene, but it was close to the right position, and the object was perfectly oriented and lit. But it was wrong. (Using the bounding box method.. it should have been able to identify exactly what I wanted to do. Instead it removed the bounding box and replaced a different object in a different but close-by position.)
Are there any models that have been specifically trained to be able to infill or replace specific locations in an image with reference to an example image? Or is this just like a really esoteric task?
So far all the in-filling models I've found are only based on text inputs.
Sorry, there seems to be an error. Please try again soon.”
Never thought I would ever see this on a google owned websites!
Really? Google used to be famous not only for its errors, but for its creative error pages. I used to have a google.com bookmark that would send an animated 418.
https://developers.googleblog.com/en/introducing-gemini-2-5-...
It seems like this is 'nano-banana' all along
https://openrouter.ai/google/gemini-2.5-flash-image-preview
[1] https://developers.googleblog.com/en/introducing-gemini-2-5-...
Edit: OK, OK, I actually got it to work, and yes, I admit the results are incredible[2]. I honestly have no idea what happened with Pro 2.5 the first time.
[1]: https://g.co/gemini/share/5767894ee3bc [2]: https://g.co/gemini/share/a48c00eb6089
""" Unfortunately, I can't generate images of people. My purpose is to be helpful and harmless, and creating realistic images of humans can be misused in ways that are harmful. This is a safety policy that helps prevent the generation of deepfakes, non-consensual imagery, and other problematic content.
If you'd like to try a different image prompt, I can help you create images of a wide range of other subjects, such as animals, landscapes, objects, or abstract concepts. """
"Unfortunately I'm not able to generate images that might cause bad PR for Alphabet(tm) or subsidiaries. Is there anything else I can generate for you?"
https://www.reddit.com/r/LocalLLaMA/comments/1mx1pkt/qwen3_m...
It’s possible that they relaxed the safety filtering to allow humans but forgot to update the error message.
No comments yet
https://en.m.wikipedia.org/wiki/Sturmabteilung
https://postimg.cc/xX9K3kLP
...
It didn't succeed in doing the same recursively, but it's still clearly a huge advance in image models.
1. Reduce article to a synopsis using an LLM
2. Generate 4-5 varying description prompts from the synopsis
3. Feed the prompts to an imagegen model
Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.
The response was a summary of the article that was pretty good, along with an image that dagnabbit, read the assignment.
Also women are seen as more cooperative and submissive, hence so many home assistants and AI being women's voices/femme coded.
Hope they get API issues resolved soon.
Flash Image is an image (and text) predicting large language model. In a similar fashion to how trained LLMs can manipulate/morph text, this can do that for images as well. Things like style transfer, character consistency etc.
You can communicate with it in a way you can't for imagen, and it has a better overall world understanding.
Gemini Flash Image: ChatGPT image, but by Google
Edit: Nevermind its not in gemini for everyone yet, its in aistudio though
Definitely inferior to results I see on AI Studio and image generation time is 6s on AI Studio vs 30 seconds on Fal.AI
Quality or latency?
https://digital-strategy.ec.europa.eu/en/news/eu-rules-gener...
No comments yet
This is why I'm sticking mostly to Adobe Photoshop's AI editing because there are no restrictions in that regard.
made me realize that AI image modification is now technically flawless, utterly devoid of taste, and that I myself am a rather unattractive fellow.
https://9to5google.com/2025/08/25/android-apps-developer-ver...
To be honest I am kind of glad. As AI generated images proliferate, I am hoping it will be easier for humans to call them out as AI.
Edit: the blog post is now loading and reports "1290 output tokens per image" even though on the AI studio it said something different.