Prompt Injection: Some sloppy cheaters who left their evidence all over ArXiv (statmodeling.stat.columbia.edu)

I've come to realize that I liked believing that there was something special about the human mental ability to use our mind's eye and visual imagination to picture something, such as how we would look with a different hairstyle. It's uncomfortable seeing that skill reproduced by machinery at the same level as my own imagination, or even better. It makes me feel like my ability to use my imagination is no more remarkable than my ability to hold a coat off the ground like a coat hook would.

FuckButtons · 1h ago

I have aphantasia, I’m glad we’re all on a level playing field now.

yoz-y · 59m ago

I always thought I had a vivid imagination. But then the aphantasia was mentioned in Hello Internet once, I looked it up, see comments like these and honestly…

I’ve no idea how to even check. According to various tests I believe I have aphantasia. But mostly I’ve got not even a slightest idea on how not having it is supposed to work. I guess this is one of those mysteries when a missing sense cannot be described in any manner.

foofoo12 · 20m ago

Ask people to visualize a thing. Pick something like a house, dog, tree, etc. Then ask about details. Where is the dog?

I have aphantasia and my dog isn't anywhere. It's just a dog, you didn't ask me to visualize anything else.

When you ask about details, like color, tail length, eyes then I have to make them up on the spot. I can do that very quickly but I don't "see" the good boy.

jmcphers · 51m ago

A simple test for aphantasia that I gave my kids when they asked about it is to picture an apple with three blue dots on it. Once you have it, describe where the dots are on the apple.

Without aphantasia, it should be easy to "see" where the dots are since your mind has placed them on the apple somewhere already. Maybe they're in a line, or arranged in a triangle, across the middle or at the top.

Sohcahtoa82 · 39m ago

After reading your first sentence, I immediately saw an apple with three dots in a triangle pointing downwards on the side. Interestingly, the 3 dots in my image were flat, as if merely superimposed on an image of an apple, rather than actually being on an apple.

How do people with aphantasia answer the question?

foofoo12 · 27m ago

I found out recently that I have aphantasia, based on everything I've read. When you tell me to visualize, I imagine. I don't see it. An apple, I can imagine that. I can describe it in incomprehensibly sparse details. But when you ask details I have to fill them in.

I hadn't really placed those three dots in a specific place on the apple. But when you ask where they are, I'll decide to put them in a line on the apple. If you ask what color they are, I'll have to decide.

jvanderbot · 27m ago

They may not answer but what they'll realize is that the "placing" comes consciously after the "thinking of" which does not happen with others.

That is, they have to ascribe a placement rather than describe one in the image their mind conjured up.

wrs · 26m ago

There's no apple, much less any dots. Of course, I'm happy to draw you an apple on a piece of paper, and draw some dots on that, then tell you where those are.

aaronblohowiak · 31m ago

oh just close your eyes and imagine an apple for a few moments, then open your eyes, look at the wikipedia article about aphantasia and pick the one that best fits the level of detail you imagined.

Revisional_Sin · 1h ago

Aphantasia gang!

m3kw9 · 1h ago

To be fair, the model's ability came from us generating the training data.

quantummagic · 43m ago

To be fair, we're the beneficiaries of nature generating the data we trained on ourselves. Our ability came from being exposed to training in school, and in the world, and from examples from all of human history. Ie. if you locked a child in a dark room for their entire lives, and gave them no education or social interaction, they wouldn't have a very impressive imagination or artistic ability either.

We're reliant on training data too.

micromacrofoot · 55m ago

it can only do this because it's been trained on millions of human works

vunderba · 1h ago

Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.

I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.

https://genai-showdown.specr.net/image-editing

Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.

No comments yet

namibj · 3m ago

After looking at Cases 4, 9, 23, 33, and 61, I think it might be suited to take in several wide-angle pictures or photospheres or such from inside a residence, and output a corresponding floor plan schematic.

If anyone has examples, guides, or anything to save me from pouring unnecessary funds into those API credits just to figure out how to feed it for this kind of task, I'd really appreciate sharing.

xnx · 1h ago

Amazing model. The only limit is your imagination, and it's only $0.04/image.

Since the page doesn't mention it, this is the Google Gemini Image Generation model: https://ai.google.dev/gemini-api/docs/image-generation

Good collection of examples. Really weird to choose an inappropriate for work one as the second example.

smrtinsert · 29m ago

Is it a single model or is it a pipeline of models?

SweetSoftPillow · 25m ago

Single model, Gemini 2.5 Flash with native image output capability.

warkdarrior · 33m ago

More specifically, Nano Banana is tuned for image editing: https://gemini.google/overview/image-generation

vunderba · 9m ago

Yep, Google actually recommends using Imagen4 / Imagen4 Ultra for straight image generation. In spite of that, Flash 2.5 still scored shockingly high on my text-to-image comparisons though image fidelity is obviously not as good as the dedicated text to image models.

Came within striking distance of OpenAI gpt-image-1 at only one point less.

minimaxir · 1h ago

[misread]

vunderba · 1h ago

They're referring to Case 1 Illustration to Figure, the anime figurine dressed in a maid outfit in the HN post.

pdpi · 1h ago

I assume OP means the actual post.

The second example under "Case 1: Illustration to Figure" is a panty shot.

darkamaul · 1h ago

This is amazing. Not that long ago, even getting a model to reliably output the same character multiple times was a real challenge. Now we’re seeing this level of composition and consistency. The pace of progress in generative models is wild.

Huge thanks to the author (and the many contributors) as well for gathering so many examples; it’s incredibly useful to see them to better understand the possibilities of the tool.

minimaxir · 1h ago

I recently released a Python package for easily generating images with Nano Banana: https://github.com/minimaxir/gemimg

Through that testing, there is one prompt engineering trend that was consistent but controversial: both a) LLM-style prompt engineering with with Markdown-formated lists and b) old-school AI image style quality syntatic sugar such as award-winning and DSLR camera are both extremely effective with Gemini 2.5 Flash Image, due to its text encoder and larger training dataset which can now more accurately discriminate which specific image traits are present in an award-winning image and what traits aren't. I've tried generations both with and without those tricks and the tricks definitely have an impact. Google's developer documentation encourages the latter.

However, taking advantage of the 32k context window (compared to 512 for most other models) can make things interesting. It’s possible to render HTML as an image (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...) and providing highly nuanced JSON can allow for consistent generations. (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...)

plomme · 40m ago

This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.

I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.

Am I using the wrong model, somehow??

SweetSoftPillow · 35m ago

Play around with your prompt, try ask Gemini 2.5 pro to improve your prompt before sending it to Gemini 2.5 Flash, retry and learn what works and what doesn't.

epolanski · 37m ago

I understand the results are non deterministic but I get absolute garbage too.

Uploaded pics of my (32 years old) wife and we wanted to ask it to give her a fringe/bangs to see how would she look like it either refused "because of safety" and when it complied results were horrible, it was a different person.

After many days and tries we got it to make one but there was no way to tweak the fringe, the model kept returning the same pic every time (with plenty of "content blocked" in between).

SweetSoftPillow · 29m ago

Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.

istjohn · 45m ago

Personally, I'm underwhelmed by this model. I feel like these examples are cherry-picked. Here are some fails I've had:

- Given a face shot in direct sunlight with severe shadows, it would not remove the shadows

- Given an old black and white photo, it would not render the image in vibrant color as if taken with a modern DSLR camera. It will colorize the photo, but only with washed out, tinted colors

- When trying to reproduce the 3 x 3 grid of hair styles, it repeatedly created a 2x3 grid. Finally, it made a 3x3 grid, but one of the nine models was black instead of caucasian.

- It is unable to integrate real images into fabricated imagery. For example, when given an image of a tutu and asked to create an image of a dolphin flying over clouds wearing the tutu, the result looks like a crude photoshop snip and copy/paste job.

foofoo12 · 17m ago

> I feel like these examples are cherry-picked

I don't know of a demo, image, film, project or whatever where the showoff pieces are not cherry picked.

mustaphah · 19m ago

In a side-by-side comparison with GPT-4o [1], they are pretty much on par.

[1] https://github.com/JimmyLv/awesome-nano-banana

Animats · 2h ago

I have two friends who are excellent professional graphic artists and I hesitate to send them this.

SweetSoftPillow · 1h ago

They better learn it today than tomorrow. Even though it's might be painful for some who does not like to learn new tools and explore new horizons.

mitthrowaway2 · 1h ago

Maybe they're better off switching careers? At some point, your customers aren't going to pay you very much to do something that they've become able to do themselves.

There used to be a job people would do, where they'd go around in the morning and wake people up so they could get to work on time. They were called a "knocker-up". When the alarm clock was invented, these people lose their jobs to other knockers-up with alarm clocks, they lost their jobs to alarm clocks.

non_aligned · 1h ago

A lot of technological progress is about moving in the other direction: taking things you can do yourself and having others do it instead.

You can paint your own walls or fix your own plumbing, but people pay others instead. You can cook your food, but you order take-out. It's not hard to sew your own clothes, but...

So no, I don't think it's as simple as that. A lot of people will not want the mental burden of learning a new tool and will have no problem paying someone else to do it. The main thing is that the price structure will change. You won't be able to charge $1,000 for a project that takes you a couple of days. Instead, you will need to charge $20 for stuff you can crank out in 20 minutes with gen AI.

GMoromisato · 1h ago

I agree with this. And it's not just about saving time/effort--an artist with an AI tool will always create better images than an amateur, just as an artist with a camera will always produce a better picture than me.

That said, I'm pretty sure the market for professional photographers shrank after the digital camera revolution.

AstroBen · 1h ago

I don't know if "learning this tool" is gunna help..

frfl · 1h ago

While these are incredibly good, it's sad to think about the unfathomable amount of abuse, spam, disinformation, manipulation and who know what other negatives these advancement are gonna cause. It was one thing when you could spot an AI image, but now and moving forward it's be basically increasingly futile to even try.

Almost all "human" interaction online will be subject to doubt soon enough.

Hard to be cheerful when technology will be a net negative overall even if it benefits some.

signatoremo · 1h ago

By your logic email is clearly a net negative, given how much junk it generates - spam, phishing, hate mails, etc. Most of my emails at this point are spams.

frfl · 54m ago

If we're talking objectively, yeah by definition if it's a net negative, it's a net negative. But we can both agree in absolute terms the negatives of email are manageable.

Hopefully you understand the sentiment of my original message, without getting into the semantics. AI advancement, like email when it arrived, are gonna turbocharge the negatives. Difference is in the magnitude of the problem. We're dealing with whole different scale we have never seen before.

Re: Most of my emails at this point are spams. - 99% of my emails are not spam. Yet AI spam is everywhere else I look online.

No comments yet

flysonic10 · 50m ago

I added some of these examples into my Nanna Banana image generator: https://nannabanana.ai

stoobs · 28m ago

I'm pretty sure these are cherry-picked out of many generation attempts, I tried a few basic things and it flat out refused to do many of them like turning a cartoon illustration into a real-world photographic portrait, it kept wanting to create a pixar style image, then when I used an ai generated portrait as an example, it refused with an error saying it wouldn't modify real world people...

I then tried to generate some multi-angle product shots from a single photo of an object, and it just refused to do the whole left, right, front, back thing, and kept doing things like a left, a front, another left, and weird half back/half side view combination.

Very frustrating.

SweetSoftPillow · 26m ago

Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.

stoobs · 12m ago

I'm in AI Studio, and weirdly I get no safety settings.

I had them before when I was trying this and yes, I had them turned off.

destel · 1h ago

Some examples are mind blowing. It’s interesting if it can generate web/app designs

AstroBen · 1h ago

I just tried it for an app I'm working on.. very bad results

eig · 1h ago

While I think most of the examples are incredible...

...the technical graphics (especially text) is generally wrong. Case 16 is an annotated heart and the anatomy is nonsensical. Case 28 with the tallest buildings has the decent images, but has the wrong names, locations, and years.

vunderba · 1h ago

Yeah I think some of them are really more proof of concept than anything.

Case 8 Substitute for ControlNet

The two characters in the final image are VERY obviously not in the instructed set of poses.

SweetSoftPillow · 1h ago

Yes, it's Gemini Flash model, meaning it's fast and relatively small and cheap, optimized for performance rather than quality. I would not expect mind-blowing capabilities in fine details from this class of models, but still, even in this regard this model sometimes just surprisingly good.

AstroBen · 1h ago

The #1 most frustrating part of image models to me has always been their inability to keep the relevant details. Ask to change a hairstyle and you'd get a subtly different person

..guess that's solved now.. overnight. Mindblowing

m3kw9 · 1h ago

The ability to pretty accurately keep the same image from an input is a clear sign of it's improved abilities.

moralestapia · 1h ago

Wow, just amazing.

Is this model open? Open weights at least? Can you use it commercially?

SweetSoftPillow · 1h ago

This is a Google's Gemini flash 2.5 model with native image output capability. It's fast, relatively cheap and SOTA-quality, and available via API. I think getting this kind of quality in open source models will need some time, probably first from Chinese models and then from BlackForestLabs or Google's open source (Gemma) team.

vunderba · 1h ago

Outside of Google Deepmind open sourcing the code and weights of AlphaFold, I don't think they've released any of their GenAI stuff (Imagen, Gemini, Flash 2.5, etc).

The best multimodal models that you can run locally right now are probably Qwen-Edit 20b, and Kontext.Dev.

https://qwenlm.github.io/blog/qwen-image-edit

https://bfl.ai/blog/flux-1-kontext-dev

SweetSoftPillow · 1h ago

Google also open sources Gemma LLMs and embedding models, which are quite good at the time of release (SOTA or near-SOTA in the open source field).

vunderba · 1h ago

Oh very nice I wasn't aware of that [1] [2]. Adding the links as well.

[1] https://deepmind.google/models/gemma

[2] https://huggingface.co/google/gemma-7b [2]

minimaxir · 1h ago

Flux Kontext has similar quality, is open weight, and the outputs can be used commercially, however prompt adherence is good-but-not-as-good.

ChrisArchitect · 53m ago

sigh

so many little details off when the instructions are clear and/or the details are there. Brad Pitt jeans? The result are not the same style and missing clear details which should be expected to just translate over.

Another one where the prompt ended with output in a 16:9 ratio. The image isn't in that ratio.

The results are visually something but then still need so much review. Can't trust the model. Can't trust people lazily using it. Someone mentioned something about 'net negative'.

istjohn · 36m ago

Yes, almost all of the examples are off in one way or another. The viewpoints don't actually match the arrow directions, for example. And if you actually use the model, you will see that even these examples must be cherry-picked.

The Power of Ten: Rules for Developing Safety Critical Code [pdf] (spinroot.com)

Improving state machine code generation (trifectatech.org)

The bloat of edge-case first libraries (43081j.com)

Evaluating and Optimizing LLM Applications with DSPy (pedramnavid.com)

Bolsonaro Sentenced to 27 Years in Prison for Plotting Coup (bloomberg.com)

PEP 751: Pylock.toml (peps.python.org)

ModStealer cross-platform malware undetected by AV tools targeting developers (9to5mac.com)

Albania appoints first AI-made minister (politico.eu)

Wind and solar power fuel over one-third of Brazil's electricity for first time (apnews.com)

Fartscroll-Lid: An app that plays fart sounds when opening or closing a MacBook (github.com)

Bolsonaro Convicted of Attempting a Coup in Brazil (nytimes.com)

Statement on OpenAI's Nonprofit and PBC (openai.com)

Danish supermarket chain is setting up "Emergency Stores" (swiss.social)

The Bill Gates Interview (1994) (tech-insider.org)

Fear and Loathing in America by Hunter S. Thompson (espn.com)

Active Heat Pump Group Buys / Pre-Negotiated Offers (US, Canada, International) (old.reddit.com)

Drug Interaction Checker (reference.medscape.com)

SHOW HN: I develop the best Website, Mobile(Android/iOS), AI/ML and many more

Emacs Bankruptcy (irreal.org)

Acoustic emissions dose and blood-brain barrier opening with focused ultrasound (cell.com)

Thoughts on How to Disagree (johnsillings.com)

OpenAI's Sam Altman sees a future with a collective 'superintelligence' (thenewstack.io)

Our Diet Is Our Destiny (medscape.com)

Statistical ­ Nonsignificance in Empirical Economics [pdf] (economics.mit.edu)

PFM-1 Land Mine (en.wikipedia.org)

Dissecting Batching Effects in GPT Inference (le.qun.ch)

Strawberries in Winter (theatlantic.com)

In Praise of Passivity [pdf] (bazhum.muzhp.pl)

Show HN: Pbar.io – Distributed progress bars that work in terminals and browsers (pbar.io)

SEC Moves to Dismiss Case Against Former Nikola CEO Milton (bloomberg.com)

Show HN: Fast Tor Onion Service vanity address generator (github.com)

Duke, Michigan.gov, CA.gov Compromised in Large-Scale SEO Attack (il.ly)

The ClickFix Attack That Wasn't: From a Fake AnyDesk Installer to MetaStealer (huntress.com)

Architecture by Fashion, Not Fundamentals (substack.com)

I Finished Version v3.0.5 (github.com)

Patela: A basement full of amnesic servers (osservatorionessuno.org)

OpenAI and Microsoft agree key terms in contract renegotiation (ft.com)

Prompt Injection: Some sloppy cheaters who left their evidence all over ArXiv (statmodeling.stat.columbia.edu)

38C3 – BlinkenCity: Radio-Controlling Street Lamps and Power Plants (youtube.com)

Book review: The road to paradox (ndpr.nd.edu)

Free Chrome extension for converting SEC filings to PDFs (chromewebstore.google.com)

Show HN: Masters Tool-Aggregate Practical Toolbox (masters-tool.com)

KVM Forum 2025 (kvm-forum.qemu.org)

Nature Index 2025: United States losing as China's lead expands rapidly (nature.com)

Meta's Elite AI Unit Sparks Tension with Old Guard (wsj.com)

Sandboxing Browser AI Agents (earlence.com)

Nature Index 2025 Research Leaders (nature.com)

Sega Accused of Using Police to Recover Nintendo Dev Kits (timeextension.com)

The Apache Software Foundation's New Logo (news.apache.org)

Muslin (en.wikipedia.org)

Unusual Capabilities of Nano Banana (Examples)

Comments (62)

Statistical Nonsignificance in Empirical Economics [pdf] (economics.mit.edu)