Ollama and Bifrost –> Qwen3 in Claude Code

1 all2 2 9/4/2025, 2:13:48 PM
After a short conversation [0], I found it was possible to inject any model into Claude Code using the Bifrost docker container [1].

# SETUP

- an API key to some LLM service other than Anthropic OR a local LLM hosted through one of Bifrost's supported providers.

- claude code installed somewhere useful

# (Optional) Ollama

I'm using Ollama to serve Qwen3 from a 4090. I use the Ollama systemd service. Here's the .service definition I'm using. I serve this across a subnet, and the important line here is `Environment="OLLAMA_HOST=0.0.0.0"`

    ```
    [Unit]
    Description=Ollama Service
    Wants=network-online.target
    After=network.target network-online.target

    [Service]
    ExecStart=/usr/bin/ollama serve
    WorkingDirectory=/var/lib/ollama
    Environment="HOME=/var/lib/ollama"
    Environment="OLLAMA_MODELS=/var/lib/ollama"
    Environment="OLLAMA_HOST=0.0.0.0"
    User=ollama
    Group=ollama
    Restart=on-failure
    RestartSec=3
    RestartPreventExitStatus=1
    Type=simple
    PrivateTmp=yes
    ProtectSystem=full
    ProtectHome=yes

    [Install]
    WantedBy=multi-user.target
    ```

# Bifrost

Pull the Bifrost container from the appropriate place [1].

`docker run -d -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost`

## Provider Setup

Nav to <bifrost URL>/providers, click "Manage Providers", and add your LLM provider.

# Claude Code

Now we override Claude Code's API endpoint [2]. I added the following to my .bashrc:

    ```
    function deepseek () {
 MODEL="qwen3:8b"
 #MODEL="hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL"
 
 ANTHROPIC_BASE_URL=http://100.100.0.11:8080/anthropic ANTHROPIC_AUTH_TOKEN="dummy-key" API_TIMEOUT_MS=600000 ANTHROPIC_MODEL=ollama/$MODEL ANTHROPIC_SMALL_FAST_MODEL=ollama/$MODEL claude
    }
    ```
# Results

It is slow with Qwen3, but the model isn't dumb. It performs quite a bit better on tasks than I thought it would.

The whole 'conversation' is here [3].

--- Et Fin. ---

# Links

[0] https://news.ycombinator.com/item?id=45116978#45117578

[1] https://github.com/maximhq/bifrost/tree/main

[2] https://api-docs.deepseek.com/guides/anthropic_api

[3] https://pastebin.com/jFrUPw5w

Comments (2)

incomingpain · 1d ago
Why would you want to when the model is trained on qwen code? Which is just a gemini fork? and it's super fast there?
all2 · 1d ago
Qwen3 wasn't the important part. You can use pretty much any model you want. For example, you'll see in the .bashrc function I have two different models called out (though Deepseek _still_ doesn't support tool calling on Ollama, after almost a year).

What I've been trying to dodge is the 5 hour limits on Claude Code. This lets me do that.