Andrej Karpathy: Software in the era of AI [video] (youtube.com)

I've been playing around with the new Qwen3 models recently (from Alibaba). They’ve been leading a bunch of benchmarks recently, especially in coding, math, reasoning tasks and I wanted to see how they work in a Retrieval-Augmented Generation (RAG) setup. So I decided to build a basic RAG chatbot on top of Qwen3 using LlamaIndex.

Here’s the setup:

Model: Qwen3-235B-A22B (the flagship model via Nebius Ai Studio)

RAG Framework: LlamaIndex

Docs: Load → transform → create a VectorStoreIndex using LlamaIndex

Storage: Works with any vector store (I used the default for quick prototyping)

UI: Streamlit (It's the easiest way to add UI for me)

One small challenge I ran into was handling the <think> </think> tags that Qwen models sometimes generate when reasoning internally. Instead of just dropping or filtering them, I thought it might be cool to actually show what the model is “thinking”.

So I added a separate UI block in Streamlit to render this. It actually makes it feel more transparent, like you’re watching it work through the problem statement/query.

Nothing fancy with the UI, just something quick to visualize input, output, and internal thought process. The whole thing is modular, so you can swap out components pretty easily (e.g., plug in another model or change the vector store).

Here’s the full code if anyone wants to try or build on top of it: GitHub: https://github.com/Arindam200/awesome-ai-apps/tree/main/rag_...

And I did a short walkthrough/demo here: YouTube: https://www.youtube.com/watch?v=L7P8RcKcdzI

Would love to hear if anyone else is using Qwen3 or doing something fun with LlamaIndex or RAG stacks. What’s worked for you?

Comments (3)

cenktekin · 32d ago

Thanks for sharing this! I've also been using Qwen3 recently and I'm really impressed with its performance, especially in terms of speed and consistency. The idea of visualizing the tags is brilliant! I'll definitely check out your code. What kind of tasks are you primarily using Qwen3 for?

tomasen9987 · 33d ago

I have tried Gemma before haven't got a chance to try Qwen3 yet.

What do you think is difference between Gemma and Qwen when it comes to RAG performance?

Arindam1729 · 33d ago

I haven't tried comparing both, but Qwen's reasoning quality is better.

Andrej Karpathy: Software in the era of AI [video] (youtube.com)

Honda conducts successful launch and landing of experimental reusable rocket (global.honda)

The Grug Brained Developer (2022) (grugbrain.dev)

YouTube's new anti-adblock measures (iter.ca)

Working on databases from prison (turso.tech)

Show HN: Workout.cool – Open-source fitness coaching platform (github.com)

U.S. bombs Iranian nuclear sites (bbc.co.uk)

Samsung embeds IronSource spyware app on phones across WANA (smex.org)

WhatsApp introduces ads in its app (nytimes.com)

Show HN: Unregistry – “docker push” directly to servers without a registry (github.com)

Resurrecting a dead torrent tracker and finding 3M peers (kianbradley.com)

Start your own Internet Resiliency Club (bowshock.nl)

Harper – an open-source alternative to Grammarly (writewithharper.com)

Phoenix.new – Remote AI Runtime for Phoenix (fly.io)

Why SSL was renamed to TLS in late 90s (2014) (tim.dierks.org)

Building Effective AI Agents (anthropic.com)

New US visa rules will force foreign students to unlock social media profiles (theguardian.com)

Microsoft suspended the email account of an ICC prosecutor at The Hague (nytimes.com)

The Zed Debugger Is Here (zed.dev)

Scrappy – Make little apps for you and your friends (pontus.granstrom.me)

Hurl: Run and test HTTP requests with plain text (github.com)

My iPhone 8 Refuses to Die: Now It's a Solar-Powered Vision OCR Server (terminalbytes.com)

Fossify – A suite of open-source, ad-free apps (github.com)

Generative AI coding tools and agents do not work for me (blog.miguelgrinberg.com)

Show HN: Chawan TUI web browser (chawan.net)

Show HN: I wrote a new BitTorrent tracker in Elixir (github.com)

Accumulation of cognitive debt when using an AI assistant for essay writing task (arxiv.org)

Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Lite (blog.google)

How to modify Starlink Mini to run without the built-in WiFi router (olegkutkov.me)

Now might be the best time to learn software development (substack.com)

Nanonets-OCR-s – OCR model that transforms documents into structured markdown (huggingface.co)

Websites are tracking you via browser fingerprinting (engineering.tamu.edu)

Iran asks its people to delete WhatsApp from their devices (apnews.com)

What happens when clergy take psilocybin (nautil.us)

MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model (github.com)

Bzip2 crate switches from C to 100% Rust (trifectatech.org)

Bento: A Steam Deck in a Keyboard (github.com)

Brad Lander detained by masked federal agents inside immigration court (thecity.nyc)

Snorting the AGI with Claude Code (kadekillary.work)

Canyon.mid (canyonmid.com)

Datalog in Rust (github.com)

Show HN: Canine – A Heroku alternative built on Kubernetes (github.com)

Tell HN: Beware confidentiality agreements that act as lifetime non competes

OpenAI wins $200M U.S. defense contract (cnbc.com)

Guess I'm a rationalist now (scottaaronson.blog)

Airpass – Easily overcome WiFi time limits (airpass.tiagoalves.me)

AbsenceBench: Language models can't tell what's missing (arxiv.org)

Compiling LLMs into a MegaKernel: A path to low-latency inference (zhihaojia.medium.com)

Childhood leukemia: how a deadly cancer became treatable (ourworldindata.org)

Is gravity just entropy rising? Long-shot idea gets another look (quantamagazine.org)

Show HN: RAG chatbot using Qwen3 with custom thinking UI

Comments (3)