At $2/1Mt it's cheaper than e.g. Gemini 2.5 Pro which is ($1.25/1Mt for input and $10/1Mt per output). When I code with Aider my requests average to something like 5000 tokens input and 800 tokens output. At this rate, Gemini 2.5 Pro is about $0.01425 per single Aider request and Cerebras Qwen3 Coder is $0.0116 per request. Not a significant difference, but I think sufficiently cheaper to be competitive, especially given Qwen3-coder is on part with Gemini/Claude/o3, it even surpasses them in some tests.
NOTE: Currently in OpenRouter, Qwen3-Coder requests are averaging to $0.3/1M input tok and $1.2/1M output tok. That's just so significantly cheaper that I wouldn't be surprised if open weight models start eating Google/Anthropic/OpenAI lunch. https://openrouter.ai/qwen/qwen3-coder
I'd like to try this out: use Claude Code as the interface, setup claude-code-router to connect to Cerebras Qwen3 coder and see 20x speed up. The speed difference might make up for the slightly less intelligence compared to Sonnet or Opus.
It's averaging to $0.3/1M input tok and $1.2/1M output tok. That's kind of mind blowingly cheap for a model at its caliber. Gemini 2.5 Pro is more than 10x that price.
retreatguru · 1h ago
It's up there now.
alcasa · 2h ago
Really cool, especially once 256k context size becomes available.
I think higher performance will be a key differentiator in AI tool quality from a user perspective, especially in use-cases where model quality is already sufficiently good for human-in-loop usage.
NOTE: Currently in OpenRouter, Qwen3-Coder requests are averaging to $0.3/1M input tok and $1.2/1M output tok. That's just so significantly cheaper that I wouldn't be surprised if open weight models start eating Google/Anthropic/OpenAI lunch. https://openrouter.ai/qwen/qwen3-coder
EDIT: also we need to note that according to preliminary results, Qwen3-Coder seems to be scoring less than Gemini 2.5 Pro, but it's arguably pretty close: https://www.reddit.com/r/LocalLLaMA/comments/1ka66y0/qwen3_b...
I'd like to try this out: use Claude Code as the interface, setup claude-code-router to connect to Cerebras Qwen3 coder and see 20x speed up. The speed difference might make up for the slightly less intelligence compared to Sonnet or Opus.
I don't see Qwen3 Coder available yet on Open Router https://openrouter.ai/provider/cerebras
I think higher performance will be a key differentiator in AI tool quality from a user perspective, especially in use-cases where model quality is already sufficiently good for human-in-loop usage.