Ask HN: Qwen3 – is it ready for driving AI agents?
Initially I was quite impressed with it's problem solving capabilities, when outputting the code through the chat interface. It addressed certain problems much better than Claude or Gemini. However, as soon as I switched to Alibaba Cloud's API to provide Dashscope based implementation of cognizer interface of my new generation of AI agents (chain of code), the whole charm was gone.
Qwen3 struggles with structured generation attempts, quite often falling into an infinite loop when spitting out tokens.
It has troubles crossing boundaries of languages, which is crucial for my agents which are "thinking in code" - writing Kotlin script, containing JavaScript, containing SQL, etc., therefore it will not work well as automated software engineer.
It is "stubborn" - even when the syntax error in generated code is clearly indicated, it is rather wiling to output the same error code again and again, instead of testing another hypothesis.
It lacks the theory of mind and understanding of the context and the environment. For example when asked to check the recent news, it is always responding by trying to use BBC API url, with non-filled API key as a part of the request, while passing this url to the Files tool instead of the WebBrowser tool, which obviously fails.
And the last, but not least - censorship, for example Qwen3 will refuse to search for the information on the most recent anti-governmental protests in China. I wouldn't be surprised if these censorship blockers were partially responsible for poor quality of cognition in other areas.
Maybe I'm doing something wrong, and you are getting much better results with this model for fully autonomous agents with feedback loop?
No comments yet