Show HN: I made a competitive debating game(like chess.com but for debating) (crs-prod-rankeddebate-l4dnggfaca-nn.a.run.app)

I’m interested to see what this model can do, but also kinda annoyed at the use of a Studio Ghibli style image as one of the first examples. Miyazaki has said over and over that he hates AI image generation. Is it really so much to ask that people not deliberately train LoRAs and finetunes specifically on his work and use them in official documentation?

It reminds me of how CivitAI is full of “sexy Emma Watson” LoRAs, presumably because she very notably has said she doesn’t want to be portrayed in ways that objectify her body. There’s a really rotten vein of “anti-consent” pulsing through this community, where people deliberately seek out people who have asked to be left out of this and go “Oh yeah? Well there’s nothing you can do to stop us, here’s several terabytes of exactly what you didn’t want to happen”.

rushingcreek · 2h ago

Not sure why this isn’t a bigger deal —- it seems like this is the first open-source model to beat gpt-image-1 in all respects while also beating Flux Kontext in terms of editing ability. This seems huge.

jug · 36m ago

I think it does way more than gpt-image-1 too?

Besides style transfer, object additions and removals, text editing, manipulation of human poses, it also supports object detection, semantic segmentation, depth/edge estimation, super-resolution and novel view synthesis (NVS) i.e. synthesizing new perspectives from a base image. It’s quite a smorgasbord!

Early results indicate to me that gpt-image-1 has a bit better sharpness and clarity but I’m honestly not sure if OpenAI doesn’t simply do some basic unsharp mask or something as a post-processing step? I’ve always felt suspicious about that, because the sharpness seems oddly uniform even in out-of-focus areas? And sometimes a bit much, even.

Otherwise, yeah this one looks about as good.

Which is impressive! I thought OpenAI had a lead here from their unique image generation solution that’d last them this year at least.

Oh, and Flux Krea has lasted four days since announcement! In case this one is truly similar in quality to gpt-image-1.

jacooper · 17m ago

Not to mention, flux models are for non-commercial use only.

doctorpangloss · 4m ago

the license for flux models is $1,000/mo, hardly an obstacle to any serious commercial usage

hleszek · 1h ago

It's not clear from their page but the editing model is not released yet: https://github.com/QwenLM/Qwen-Image/issues/3#issuecomment-3...

minimaxir · 16m ago

With the notable exception of gpt-image-1, discussion about AI image generation has become much less popular. I suspect it's a function of a) AI discourse being dominated by AI agents/vibe coding and b) the increasing social stigma of AI image generation.

Flux Kontext was a gamechanger release for image editing and it can do some absurd things, but it's still relatively unknown. Qwen-Image, with its more permissive license, could lead to much more innovation once the editing model is released.

tetraodonpuffer · 1h ago

I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

As an aside, I am not sure why for LLM models the technology to spread among multiple cards is quite mature, while for image models, despite also using GGUFs, this has not been the case. Maybe as image models become bigger there will be more of a push to implement it.

cma · 57m ago

If 40GB you can lightly quantize and fit it on a 5090.

TacticalCoder · 1h ago

> I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

40 GB of VRAM? So two GPU with 24 GB each? That's pretty reasonable compared to the kind of machine to run the latest Qwen coder (which btw are close to SOTA: they do also beat proprietary models on several benchmarks).

zamadatix · 1h ago

It's only been a few hours and the demo is constantly erroring out, people need more time to actually play with it before getting excited. Some quantized GGUFs + various comfy workflows will also likely be a big factor for this one since people will want to run it locally but it's pretty large compared to other models. Funnily enough, the main comparison to draw might be between Alibaba and Alibaba. I.e. using Wan 2.2 for image generation has been an extremely popular choice, so most will want to know how big a leap Qwen-Image is from that rather than Flux.

The best time to judge how good a new image model actually is seems to be about a week from launch. That's when enough pieces have fallen into place that people have had a chance to really mess with it and come out with 3rd party pros/cons of the models. Looking hopeful for this one though!

rushingcreek · 1h ago

I spun up an H100 on Voltage Park to give it a try in an isolated environment. It's really, really good. The only area where it seems less strong than gpt-image-1 is in generating images of UI (e.g. make me a landing page for Product Hunt in the style of Studio Ghibli), but other than that, I am impressed.

rwmj · 2h ago

This may be obvious to people who do this regularly, but what kind of machine is required to run this? I downloaded & tried it on my Linux machine that has a 16GB GPU and 64GB of RAM. This machine can run SD easily. But Qwen-image ran out of space both when I tried it on the GPU and on the CPU, so that's obviously not enough. But am I off by a factor of two? An order of magnitude? Do I need some crazy hardware?

icelancer · 23m ago

> This may be obvious to people who do this regularly

This is not that obvious. Calculating VRAM usage for VLMs/LLMs is something of an arcane art. There are about 10 calculators online you can use and none of them work. Quantization, KV caching, activation, layers, etc all play a role. It's annoying.

But anyway, for this model, you need 40+ GB of VRAM. System RAM isn't going to cut it unless it's unified RAM on Apple Silicon, and even then, memory bandwidth is shot, so inference is much much slower than GPU/TPU.

liuliu · 5m ago

16GiB RAM with 8-bit quantization.

This is a slightly scaled up SD3 Large model.

mortsnort · 1h ago

I believe it's roughly the same size as the model files. If you look in the transformers folder you can see there are around 9 5gb files, so I would expect you need ~45gb vram on your GPU. Usually quantized versions of models are eventually released/created that can run on much less vram but with some quality loss.

halJordan · 1m ago

[delayed]

foobarqux · 1h ago

Why doesn't huggingface list the aggregate model size?

simonw · 4m ago

I've been bugging them about this for a while. There are repos that contain multiple model weights in a single repo which means adding up the file sizes won't work universally, but I'd still find it useful to have a "repo size" indicator somewhere.

I ended up building my own tool for that: https://tools.simonwillison.net/huggingface-storage

matcha-video · 49m ago

Huggingface is just a git hosting service, like github. You can add up the sizes of all the files in the directory yourself

zippothrowaway · 1h ago

You're probably going to have to wait a couple of days for 4 bit quantized versions to pop up. It's 20B parameters.

TacticalCoder · 1h ago

> I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.

For PCs I take it one that has two PCIe 4.0 x16 or more recent slots? As in: quite some consumers motherboards. You then put two GPU with 24 GB of VRAM each.

A friend runs this (don't know if the tried this Qwen-Image yet): it's not an "out of this world" machine.

oceanplexian · 44m ago

Does anyone know how they actually trained text rendering into these models?

To me they all seem to suffer from the same artifacts, that the text looks sort of unnatural and doesn't have the correct shadows/reflections as the rest of the image. This applies to all the models I have tried, from OpenAI to Flux. Presumably they are all using the same trick?

yorwba · 34m ago

It's on page 14 of the technical report. They generate synthetic data by putting text on top of an image, apparently without taking the original lighting into account. So that's the look the model reproduces. Garbage in, garbage out.

Maybe in the future someone will come up with a method for putting realistic text into images so that they can generate data to train a model for putting realistic text into images.

doctorpangloss · 2m ago

i'm not sure if that's such garbage as you suggest, surely it is helpful for generalization yes? kind of the point of self-supervised models

nickandbro · 3h ago

The fact that it doesn’t change the images like 4o image gen is incredible. Often when I try to tweak someone’s clothing using 4o, it also tweaks their face. This only seems to apply those recognizable AI artifacts to only the elements needing to be edited.

vunderba · 1h ago

That's why Flux Kontext was such a huge deal - it gave you the power of img2img inpainting without needing to manually mask the content.

https://mordenstar.com/blog/edits-with-kontext

herval · 1h ago

You can select the area you want edited on 4o, and it’ll keep the rest unchanged

barefootford · 45m ago

gpt doesn't respect masks

icelancer · 25m ago

Correct. Have tried this without much success despite OpenAI's claims.

Destiner · 8m ago

The text rendering is impressive, but I don't understand the value — wouldn't it be easier to add any text that you like in Figma?

doctorpangloss · 4m ago

the value is: the absence of text where you expect it, and the presence of garbled text, are dead giveaways of AI generation

artninja1988 · 3h ago

Insane how many good Chinese open source models they've been releasing. This really gives me hope

djoldman · 4h ago

Checkout section 3.2 Data Filtering:

https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Q...

numpad0 · 28m ago

It's also kind of interesting that no other languages than English and Chinese are named or shown...

sampton · 17m ago

Short canva.

yjftsjthsd-h · 3h ago

Wow, the text/writing is amazing! Also the editing in general, but the text really stands out

anon191928 · 3h ago

It will take years for people to use these but Adobe is not alone.

herval · 1h ago

Adobe has never been alone. Photoshop’s AI stuff is consistently behind OSS models and workflows. It’s just way more convenient

dvt · 1h ago

I think Adobe is also very careful with copyrighted content not being a part of their models, which inherently makes them of lower quality.

doctorpangloss · 26s ago

as long as you don't consider the part of the model which understands text as part of the model, and as long as you don't consider copyrighted text content copyrighted :)

herval · 17m ago

They have a much better and cleaner dataset than Stable Diffusion & others, so I’d expect it to be better with some kinds of images (photos in particular)

esafak · 17m ago

Team Qwen: Please stop ripping off Studio Ghibli to demo your product.

Show HN: I spent 6 years building a ridiculous wooden pixel display (benholmen.com)

Show HN: Mathpad – Physical keypad for typing math symbols (crowdsupply.com)

Show HN: Sidequest.js – Background jobs for Node.js using your database (docs.sidequestjs.com)

Show HN: I made a competitive debating game(like chess.com but for debating) (crs-prod-rankeddebate-l4dnggfaca-nn.a.run.app)

Show HN: Tiny logic and number games I built for my kids (quizmathgenius.com)

Show HN: Kimu – Open-Source Video Editor (trykimu.com)

Show HN: FFlags – Feature flags as code, served from the edge (fflags.com)

Show HN: Grant Writing AI for Nonprofits (grantboost.io)

Show HN: Gmap: Explore Git Repos Visually from the CLI (github.com)

Show HN: Schematra – Sinatra-inspired minimal web framework for Chicken Scheme (github.com)

Show HN: A tiny reasoning layer that steadies LLM outputs (MIT; +22.4% accuracy) (github.com)

Show HN: I made a platform to create Telephone Voice AI Agencies (telezen-ai.com)

Show HN: WebGPU enables local LLM in the browser – demo site with AI chat (andreinwald.github.io)

Show HN: IRC /Whois Gallery (retlehs.github.io)

Show HN: Draw a fish and watch it swim with the others (drawafish.com)

Show HN: Automate social media with postiz and n8n (github.com)

Show HN: Spatial Web Browser Engine (m-creativelab.github.io)

Show HN: Structured Cooperation – A new way of building distributed apps & POC (github.com)

Show HN: NaturalCron – Human-Readable Scheduling for .NET (With Fluent Builder) (github.com)

Show HN: An Infinite Wiki Simulator (wikisim.jgibbs.dev)

Show HN: Wordle-style game for Fermi questions (fermiquestions.org)

Show HN: Mcp-use – Connect any LLM to any MCP (github.com)

Show HN: AgentMail – Email infra for AI agents (chat.agentmail.to)

Show HN: Turn impulse buys into dream investments (nopeit.app)

Show HN: A Programmer's Guide to Life (programmersguideto.life)

Show HN: An interactive dashboard to explore NYC rentals data (leaseswap.nyc)

Show HN: I made a website that makes you cry (cryonceaweek.com)

Show HN: TraceRoot – Open-source agentic debugging for distributed services (github.com)

Show HN: Voltpeek – Vim-inspired oscilloscope software (github.com)

Show HN: GPT helped me rebuild a .NET app in 30 mins what took 3 weeks in MFC

Show HN: QuantumFlow Toolkit – An open-source framework hybrid quantum workflows (github.com)

Show HN: Andre – A privacy-first, location-aware assistant that helps you (andreapp.org)

Show HN: Sourcebot – Self-hosted Perplexity for your codebase (github.com)

Show HN: Rewindtty – Record and replay terminal sessions as structured JSON (github.com)

Show HN: Pontoon – Open-source customer data syncs (github.com)

Show HN: KubeForge – A GUI for Kubernetes YAMLs (github.com)

Show HN: Companies use AI to take your calls. I built AI to make them for you (pipervoice.com)

Show HN: NameFast – Instantly generate brandable names for your SaaS or startup

Show HN: An AI agent that learns your product and guides your users (frigade.ai)

Show HN: Phlebas, a live timeseries sim controlled by the console (greenvitriol.com)

Show HN: I Taught an LSTM to Trade So I Could Sleep Better at Night (wolflux.site)

Show HN: Open-source alternative to ChatGPT Agents for browsing (github.com)

Show HN: Zomni – An AI sleep coach that personalizes CBT-I for everyday use (apps.apple.com)

Show HN: I build an app that tracks the cost of your meetings (capdrainapp.com)

Show HN: Enforce TDD in Claude Code (github.com)

Show HN: Print the daily weather forecast on a thermal receipt printer (github.com)

Show HN: Dlg – Zero-cost printf-style debugging for Go (github.com)

Show HN: AI Physics Tutor with Free Body Diagrams (physicsviewer.com)

Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL (github.com)

Show HN: Use Their ID – Use your local UK MP’s ID for the Online Safety Act (use-their-id.com)

Qwen-Image: Crafting with native text rendering

Comments (42)