Ollama and Bifrost –> Qwen3 in Claude Code
# SETUP
- an API key to some LLM service other than Anthropic OR a local LLM hosted through one of Bifrost's supported providers.
- claude code installed somewhere useful
# (Optional) Ollama
I'm using Ollama to serve Qwen3 from a 4090. I use the Ollama systemd service. Here's the .service definition I'm using. I serve this across a subnet, and the important line here is `Environment="OLLAMA_HOST=0.0.0.0"`
```
[Unit]
Description=Ollama Service
Wants=network-online.target
After=network.target network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
WorkingDirectory=/var/lib/ollama
Environment="HOME=/var/lib/ollama"
Environment="OLLAMA_MODELS=/var/lib/ollama"
Environment="OLLAMA_HOST=0.0.0.0"
User=ollama
Group=ollama
Restart=on-failure
RestartSec=3
RestartPreventExitStatus=1
Type=simple
PrivateTmp=yes
ProtectSystem=full
ProtectHome=yes
[Install]
WantedBy=multi-user.target
```
# BifrostPull the Bifrost container from the appropriate place [1].
`docker run -d -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost`
## Provider Setup
Nav to <bifrost URL>/providers, click "Manage Providers", and add your LLM provider.
# Claude Code
Now we override Claude Code's API endpoint [2]. I added the following to my .bashrc:
```
function deepseek () {
MODEL="qwen3:8b"
#MODEL="hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL"
ANTHROPIC_BASE_URL=http://100.100.0.11:8080/anthropic ANTHROPIC_AUTH_TOKEN="dummy-key" API_TIMEOUT_MS=600000 ANTHROPIC_MODEL=ollama/$MODEL ANTHROPIC_SMALL_FAST_MODEL=ollama/$MODEL claude
}
```
# ResultsIt is slow with Qwen3, but the model isn't dumb. It performs quite a bit better on tasks than I thought it would.
The whole 'conversation' is here [3].
--- Et Fin. ---
# Links
[0] https://news.ycombinator.com/item?id=45116978#45117578
[1] https://github.com/maximhq/bifrost/tree/main
[2] https://api-docs.deepseek.com/guides/anthropic_api
[3] https://pastebin.com/jFrUPw5w
What I've been trying to dodge is the 5 hour limits on Claude Code. This lets me do that.