This is really cool and should work well with something like RooCode as well. Usually I keep going back to either Claude Sonnet or Gemini 2.5 Pro (also tried out GPT-5, was quite unimpressed) but both of those are relatively expensive.
I've tried using the more expensive model for planning and something a bit cheaper for doing the bulk of changes (the Plan / Ask and Code modes in RooCode) which works pretty nicely, but settling on just one model like GLM 4.5 would be lovely! Closest to that I've gotten to up until now has been the Qwen3 Coder model on OpenRouter.
I think I used about 40M tokens with Claude Sonnet last month, more on Gemini and others, that's a bit expensive for my liking.
but in my testing other models do not work well, looks like prompts are either very optimized for Claude, or other models are just not great yet with such agentic environment
I was especially disappointed with grok code. it is very fast as advertised but in generating spaces and new lines in function calling until it hits max tokens. I wonder if that isn't why it gets so much tokens on openrouter.
gpt-5 just wasn't using the tools very well
I didn't tested glm yet, but with current anthropic subscription value, alternative would need to be very cheap if you consider daily use
edit: I noticed that also have very inexpensive subscription https://z.ai/subscribe, if they trained model to work well with CC this might actually be viable alternative
sdesol · 6m ago
> But in my testing, other models do not work well. It looks like prompts are either very optimized for Claude, or other models are just not great yet with such an agentic environment.
Anybody who has done any serious development with LLMs would know that prompts are not universal. The reason why Claude Code is good is because Anthropic knows Claude Sonnet is good, and that they only need to create prompts that work well with their models. They also have the ability to train their models to work with specific tools and so forth.
It really is a kind of fool's errand to try to create agents that can work well with many different models from different providers.
CuriouslyC · 8m ago
You don't need claude code router to use GLM, just set the env var to the GLM url. Also, I generally advise people not to bother with claude code router, Bifrost can do the same job and it's much better software.
tonyhart7 · 1h ago
I called it "chinnese chatpcha", back then chinnese chaptcha is so much harder than western counterpart
but now gchaptcha spam me with 5 different image if I missing a tiles for crossroad, so chinnese chaptcha is much better in my opinion
also there is variant that match the image based on shadow and different order of shape
its much better in my opinion because its use much more interactivity, solving western chaptcha is so much mind numbing now that they require you at least multiple image identification for crossroad,sign,cars etc
they want those self driving car are they
awestroke · 46m ago
I assume both of the approaches are useless at actually stopping bots
vincirufus · 4h ago
Ahh bugger I pasted the wrong link I had this one open in another tab..
chisleu · 4h ago
I've been using GLM 4.5 and GLM 4.5 Air for a while now. The Air model is light enough to run on a macbook pro and is useful for Cline. I can run the full GLM model on my Mac Studio, but the TPS is so slow that it's only useful for chatting. So I hooked up with openrouter to try but didn't have the same success. Any of the open weight models I try with open router give sub standard results. I get better results from Qwen 3 coder 30b a3b locally than I get from Qwen 3 Coder 480b through open router.
I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.
KronisLV · 19m ago
> I get better results from Qwen 3 coder 30b a3b locally than I get from Qwen 3 Coder 480b through open router. I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.
This doesn't match my experience precisely, but I've definitely had cases where some of the providers had consistently worse output for the same model than others, the solution there was to figure out which ones those are and to denylist them in the UI.
You can see that these providers run FP4 versions:
* DeepInfra (Turbo)
And these providers run FP8 versions:
* Chutes
* GMICloud
* NovitaAI
* Baseten
* Parasail
* Nebius AI Studio
* AtlasCloud
* Targon
* Together
* Hyperbolic
* Cerebras
I will say that it's not all bad and my experience with FP8 output has been pretty decent, especially when I need something done quickly and choose to use Cerebras - provided their service isn't overloaded, their TPS is really, really good.
yeah I too have heard similar concerns with Open models on OpenRouter, but haven't been able to verify it, as I don't use that a lot
numlocked · 3h ago
(OpenRouter COO here) We are starting to test this and verify the deployments. More to come on that front -- but long story short is that we don't have good evidence that providers are doing weird stuff that materially affects model accuracy. If you have data points to the contrary, we would love them.
We are heavily incentivized to prioritize/make transparent high-quality inference and have no incentive to offer quantized/poorly-performing alternatives. We certainly hear plenty of anecdotal reports like this, but when we dig in we generally don't see it.
It does take providers time to learn how to run the models in a high quality way; my expectation is that the difference in quality will be (or already is) minimal over time. The large variance in that case was because GPT OSS had only been out for a couple of weeks.
For well-established models, our (admittedly limited) testing has not revealed much variance between providers in terms of quality. There is some but it's not like we see a couple of providers 'cheating' by secretly quantizing and clearly serving less intelligence versions of the model. We're going to get more systematic about it though and perhaps will uncover some surprises.
blitzar · 48s ago
> We ... have no incentive to offer quantized/poorly-performing alternatives
However your providers do
indigodaddy · 2h ago
So what's the deal with Chutes and all the throttling and errors. Seems like users are losing their minds over this.. at least from all the reddit threads I'm seeing
chandureddyvari · 3h ago
Unsolicited advice: Why doesn’t open router provide hosting services for OSS models that guarantee non-quantised versions of the LLMs? Would be a win-win for everyone.
jjani · 2h ago
Would make very little business sense at this point - currently they have an effective monopoly on routing. Hosting would just make them one provider among a few dozen. It would make the other providers less likely to offer their services through openrouter. It would come with lots of concerns that openrouter would favor routing towards their own offerings. It would be a huge distraction to their core business which is still rapidly growing. Would need massive capital investment. And another thousand reasons I haven't thought of.
jatins · 3h ago
In fact I thought that's what OpenRouter was hosting them all along
I would be interested to know where the claim of the “killer combination” comes from. I would also like to know who the people behind Z.ai are — I haven’t heard of them before. Their plans seem crazy cheap compared to Anthropic, especially if their models actually perform better than Opus.
ekidd · 4h ago
> I would also like to know who the people behind Z.ai are — I haven’t heard of them before.
To be clear, Z.ai are the people who built GLM 4.5, so they're talking up their own product.
But to be fair, GLM 4.5 and GLM 4.5 Air are genuinely good coding models. GLM 4.5 Air costs about 10% of what Claude Sonnet does (when hosted on DeepInfra, at least), and it can perform simple coding tasks quite quickly. I haven't tested GLM 4.5 Air, but it seems to be popular as well.
If you can easily afford all the Claude Code tokens you want, then you'll probably get better results from Sonnet. But if you already know enough programming to work around any issues that arise, the GLM models are quite usable.
But you can't easily run GLM 4.5 Air quickly without professional workstation- or server-grade hardware (RTX 6000 Pro 96GB would be nice), at least not without a serious speed hit.
Still, it's a very interesting sign for the future of open coding models.
esafak · 4h ago
For agentic coding I found the price difference more modest due to prompt caching, which most GLM providers on Openrouter don't offer, but Anthropic does. Look at the cache read/write columns: https://openrouter.ai/z-ai/glm-4.5
magicalhippo · 2h ago
Been playing with Grok Code Fast 1 in Cline via Open Router. It supports prompt caching as far as I can tell, and it certainly is cheap. It's been quite good for the stuff I've tried. YMMV.
turingbook · 4h ago
Actually Z.ai is a spinoff of Tsinghua University and one of the first China labs open sourcing its own large models (GLM released in 2021) .
https://github.com/THUDM/GLM
throwaway314155 · 4h ago
It's a spinoff of the whole university?
cyp0633 · 4h ago
With a little search you can find it's a laboratory within the CS department of THU. It's a fairly large lab though, not those led by just one or two professors.
SparkyMcUnicorn · 4h ago
When it comes to "real-world development scenarios" they claim to be closer to Sonnet 4.
Well I'd call them the poor person's claude code, wouldnt compare it with Opus but very close to Sonnet and Kimi
vincirufus · 4h ago
update the title to not seem biased / hyped
Jcampuzano2 · 4h ago
Hmm with the lower context length I'm wonder how it holds up for problems requiring slightly larger context given we know most models tend to degrade fairly quickly with context length.
Maybe it's best for shorter tasks or condensed context?
I find it interesting the number of models latching onto Claude codes harness. I'm still using Cursor for work and personal but tried out open code and Claude for a bit. I just miss having the checkpoints and whatnot.
Interesting, although how hard is it to add a sorting functionality to the table?
raincole · 4h ago
I wonder how you justify this editorialized title, and if HN mods share your justification. The linked article has no the word "killer" in it.
I think this is why many people have concerns about AI. This group can't express neutral ideas. They have to hype about a simple official documentation page.
vincirufus · 4h ago
feedback accepted got rid of the killer bits
steipete · 4h ago
Been using that for a while, first Chinese model that works REALLY well!
Also fascinating how they solved the issue that Claude expects a 200+k token model while GLM 4.5 has 128k.
abrookewood · 2h ago
So you can use Claude Code with other models? I had assumed that it was tied to your subscription and that was that.
adastra22 · 1h ago
It is, but people figure out the Claude Code API and provide API compatible endpoints.
sagarpatil · 2h ago
I was blown away by this model. It was definitely comparable to sonnet 4. In some of my tests, it performed as good as Opus.
I subscribed to their paid plan, and now the model seems dumb?
I asked it to find and replace a string. It only made the change in one file. Codex worked fine.
Can Z.ai confirm if this is the model we get through their API or is it quantized for Claude Code use?
apparent · 4h ago
I stopped when I got to this sentence and realized the article is written by one of the companies mentioned.
> GLM-4.5 and GLM-4.5-Air are our latest flagship models
Maybe it is great, but with a conflict of interest so obvious I can't exactly take their word for it.
JimDabell · 4h ago
Z.AI is the company that created GLM and the link goes to their official documentation. It’s really weird to complain that their official documentation on their official website has a “conflict of interest”.
apparent · 1h ago
The title has been changed. The original title was wildly positive, and OP has acknowledged it was inappropriate and changed it (see comments below).
My issue was with an article being posted with a title saying how amazing two things are together (making it seem like it was somehow an independent review), when it was actually just a marketing post by one of the companies.
nicce · 3h ago
Just out of curiosity, is the cost of such domain worth it or whether they were just lucky.
sergiotapia · 3h ago
Used it to fix a couple of bugs just now in Elixir and it runs very fast, faster than Codex with GPT-5 medium or high.
This is quite nice. Will try it out a bit longer over the weekend. I tested it using Claude Code with env variables overrides.
I've tried using the more expensive model for planning and something a bit cheaper for doing the bulk of changes (the Plan / Ask and Code modes in RooCode) which works pretty nicely, but settling on just one model like GLM 4.5 would be lovely! Closest to that I've gotten to up until now has been the Qwen3 Coder model on OpenRouter.
I think I used about 40M tokens with Claude Sonnet last month, more on Gemini and others, that's a bit expensive for my liking.
Chinese software always has such a design language:
- prepaid and then use credit to subscribe
- strange serif font
- that slider thing for captcha
But I'm going to try it out now.
but in my testing other models do not work well, looks like prompts are either very optimized for Claude, or other models are just not great yet with such agentic environment
I was especially disappointed with grok code. it is very fast as advertised but in generating spaces and new lines in function calling until it hits max tokens. I wonder if that isn't why it gets so much tokens on openrouter.
gpt-5 just wasn't using the tools very well
I didn't tested glm yet, but with current anthropic subscription value, alternative would need to be very cheap if you consider daily use
edit: I noticed that also have very inexpensive subscription https://z.ai/subscribe, if they trained model to work well with CC this might actually be viable alternative
Anybody who has done any serious development with LLMs would know that prompts are not universal. The reason why Claude Code is good is because Anthropic knows Claude Sonnet is good, and that they only need to create prompts that work well with their models. They also have the ability to train their models to work with specific tools and so forth.
It really is a kind of fool's errand to try to create agents that can work well with many different models from different providers.
but now gchaptcha spam me with 5 different image if I missing a tiles for crossroad, so chinnese chaptcha is much better in my opinion
also there is variant that match the image based on shadow and different order of shape
its much better in my opinion because its use much more interactivity, solving western chaptcha is so much mind numbing now that they require you at least multiple image identification for crossroad,sign,cars etc
they want those self driving car are they
I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.
This doesn't match my experience precisely, but I've definitely had cases where some of the providers had consistently worse output for the same model than others, the solution there was to figure out which ones those are and to denylist them in the UI.
As for quantized versions, you can check it for each model and provider, for example: https://openrouter.ai/qwen/qwen3-coder/providers
You can see that these providers run FP4 versions:
And these providers run FP8 versions: I will say that it's not all bad and my experience with FP8 output has been pretty decent, especially when I need something done quickly and choose to use Cerebras - provided their service isn't overloaded, their TPS is really, really good.You can also request specific precision on a per request basis: https://openrouter.ai/docs/features/provider-routing#quantiz... (or just make a custom preset)
We are heavily incentivized to prioritize/make transparent high-quality inference and have no incentive to offer quantized/poorly-performing alternatives. We certainly hear plenty of anecdotal reports like this, but when we dig in we generally don't see it.
An exception is when a model is first released -- for example this terrific work by artificial analysis: https://x.com/ArtificialAnlys/status/1955102409044398415
It does take providers time to learn how to run the models in a high quality way; my expectation is that the difference in quality will be (or already is) minimal over time. The large variance in that case was because GPT OSS had only been out for a couple of weeks.
For well-established models, our (admittedly limited) testing has not revealed much variance between providers in terms of quality. There is some but it's not like we see a couple of providers 'cheating' by secretly quantizing and clearly serving less intelligence versions of the model. We're going to get more systematic about it though and perhaps will uncover some surprises.
However your providers do
I would be interested to know where the claim of the “killer combination” comes from. I would also like to know who the people behind Z.ai are — I haven’t heard of them before. Their plans seem crazy cheap compared to Anthropic, especially if their models actually perform better than Opus.
To be clear, Z.ai are the people who built GLM 4.5, so they're talking up their own product.
But to be fair, GLM 4.5 and GLM 4.5 Air are genuinely good coding models. GLM 4.5 Air costs about 10% of what Claude Sonnet does (when hosted on DeepInfra, at least), and it can perform simple coding tasks quite quickly. I haven't tested GLM 4.5 Air, but it seems to be popular as well.
If you can easily afford all the Claude Code tokens you want, then you'll probably get better results from Sonnet. But if you already know enough programming to work around any issues that arise, the GLM models are quite usable.
But you can't easily run GLM 4.5 Air quickly without professional workstation- or server-grade hardware (RTX 6000 Pro 96GB would be nice), at least not without a serious speed hit.
Still, it's a very interesting sign for the future of open coding models.
This is the data for that claim: https://huggingface.co/datasets/zai-org/CC-Bench-trajectorie...
Maybe it's best for shorter tasks or condensed context?
I find it interesting the number of models latching onto Claude codes harness. I'm still using Cursor for work and personal but tried out open code and Claude for a bit. I just miss having the checkpoints and whatnot.
I think this is why many people have concerns about AI. This group can't express neutral ideas. They have to hype about a simple official documentation page.
Also fascinating how they solved the issue that Claude expects a 200+k token model while GLM 4.5 has 128k.
> GLM-4.5 and GLM-4.5-Air are our latest flagship models
Maybe it is great, but with a conflict of interest so obvious I can't exactly take their word for it.
My issue was with an article being posted with a title saying how amazing two things are together (making it seem like it was somehow an independent review), when it was actually just a marketing post by one of the companies.
This is quite nice. Will try it out a bit longer over the weekend. I tested it using Claude Code with env variables overrides.