Qwen3 Coder 480B is Live on Cerebras

13 retreatguru 5 8/1/2025, 5:50:19 PM cerebras.ai ↗

Comments (5)

gnulinux · 10m ago
At $2/1Mt it's cheaper than e.g. Gemini 2.5 Pro which is ($1.25/1Mt for input and $10/1Mt per output). When I code with Aider my requests average to something like 5000 tokens input and 800 tokens output. At this rate, Gemini 2.5 Pro is about $0.01425 per single Aider request and Cerebras Qwen3 Coder is $0.0116 per request. Not a significant difference, but I think sufficiently cheaper to be competitive, especially given Qwen3-coder is on part with Gemini/Claude/o3, it even surpasses them in some tests.

NOTE: Currently in OpenRouter, Qwen3-Coder requests are averaging to $0.3/1M input tok and $1.2/1M output tok. That's just so significantly cheaper that I wouldn't be surprised if open weight models start eating Google/Anthropic/OpenAI lunch. https://openrouter.ai/qwen/qwen3-coder

EDIT: also we need to note that according to preliminary results, Qwen3-Coder seems to be scoring less than Gemini 2.5 Pro, but it's arguably pretty close: https://www.reddit.com/r/LocalLLaMA/comments/1ka66y0/qwen3_b...

retreatguru · 4h ago
I'm looking forward to trying this out.

I'd like to try this out: use Claude Code as the interface, setup claude-code-router to connect to Cerebras Qwen3 coder and see 20x speed up. The speed difference might make up for the slightly less intelligence compared to Sonnet or Opus.

I don't see Qwen3 Coder available yet on Open Router https://openrouter.ai/provider/cerebras

gnulinux · 7m ago
It's averaging to $0.3/1M input tok and $1.2/1M output tok. That's kind of mind blowingly cheap for a model at its caliber. Gemini 2.5 Pro is more than 10x that price.
retreatguru · 1h ago
It's up there now.
alcasa · 2h ago
Really cool, especially once 256k context size becomes available.

I think higher performance will be a key differentiator in AI tool quality from a user perspective, especially in use-cases where model quality is already sufficiently good for human-in-loop usage.