Gemini Diffusion

31 og_kalu 4 5/20/2025, 5:50:56 PM deepmind.google ↗

Comments (4)

heliophobicdude · 2h ago
I've been let off the waitlist. So far, I'm impressed with the Instant Edits. It's crazy fast. I can provide a big HTML file and prompt it to change a color theme and it makes careful edits to just the relevant parts. It seems to be able to parallelize the same instruction to multiple parts of the input. This is incredible for refactoring.

I copied a shader toy example, asked it to rename all the variables to be more descriptive and it edited just the variable names. I was able to compile and run in shader toy.

minimaxir · 4h ago
> Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step. This means they can iterate on a solution very quickly and error correct during the generation process. This helps them excel at tasks like editing, including in the context of math and code.

This is deliberately unhelpful as it begs the question "why hasn't anyone else made a good text diffusion model in the years since the technology has been available?"

The answer to that question is that unlike latent diffusion for images which can be fuzzy and imprecise before generating the result image, text has discrete outputs and therefore must be more precise, so Google is utilizing some secret sauce to work around that limitation and is keeping it annoyingly close to the chest.

smallerize · 4h ago
heliophobicdude · 3h ago
Thank you for sharing this. I'm amazed! Are there any known emergent abilities of it? I ran my evals and seems to struggle in very similar ways to smaller transformer based LLMs