Show HN: My LLM CLI tool can run tools now, from Python code or plugins

392 simonw 133 5/27/2025, 8:53:03 PM simonwillison.net ↗

Comments (133)

kristopolous · 12h ago
It's worth noting the streaming markdown renderer I wrote just for this tool: https://github.com/day50-dev/Streamdown

More background: https://github.com/simonw/llm/issues/12

(Also check out https://github.com/day50-dev/llmehelp which features a tmux tool I built on top of Simon's llm. I use it every day. Really. It's become indispensable)

kristopolous · 9h ago
Also I forgot to one other built on llm.

This one is a ZSH plugin that uses zle to translate your English to shell commands with a keystroke.

https://github.com/day50-dev/Zummoner

It's been life changing for me. Here's one I wrote today:

    $ git find out if abcdefg is a descendent of hijklmnop 
In fact I used it in one of these comments

    $ for i in $(seq 1 6); do 
      printf "%${i}sh${i}\n\n-----\n" | tr " " "#"; 
    done | pv -bqL 30 
Was originally

    $ for i in $(seq 1 6); do 
      printf "(# $i times)\n\n-----\n"
    done | pv (30 bps and quietly)
I did my trusty ctrl-x x and the buffer got sent off through openrouter and got swapped out with the proper syntax in under a second.
vicek22 · 2h ago
This is fantastic! Thank you for that.

I use fish, but the language change is straightforward https://github.com/viktomas/dotfiles/blob/master/fish/.confi...

I'll use this daily

rglynn · 2h ago
Ah this is great, in combo with something like superwhisper you can use voice for longer queries.
kazinator · 9h ago
The brace expansion syntax in Bash and Zsh expands integer ranges: {1..6}; no calling out to external command.

It's also intelligent about inferring leading zeros without needing to be told with options, e.g. {001..995}.

No comments yet

CGamesPlay · 7h ago
I built a similar one to this one: https://github.com/CGamesPlay/llm-cmd-comp

Looks from the demo like mine's a little less automatic and more iterative that yours.

kristopolous · 6h ago
Interesting! I like it!

The conversational context is nice. The ongoing command building is convenient and the # syntax carryover makes a lot of sense!

My next step is recursion and composability. I want to be able to do things contextualized. Stuff like this:

   $ echo PUBLIC_KEY=(( get the users public key pertaining to the private key for this repo )) >> .env
or some other contextually complex thing that is actually fairly simple, just tedious to code. Then I want that <as the code> so people collectively program and revise stuff <at that level as the language>.

Then you can do this through composability like so:

    with ((find the variable store for this repo by looking in the .gitignore)) as m:
      ((write in the format of m))SSH_PUBLICKEY=(( get the users public key pertaining to the private key for this repo ))
or even recursively:

    (( 
      (( 
        ((rsync, rclone, or similar)) with compression 
      ))  
        $HOME exclude ((find directories with secrets))         
        ((read the backup.md and find the server)) 
        ((make sure it goes to the right path))
    ));
it's not a fully formed syntax yet but then people will be able to do something like:

    $ llm-compile --format terraform --context my_infra script.llm > some_code.tf
and compile publicly shared snippets as specific to their context and you get abstract infra management at a fractional complexity.

It's basically GCC's RTL but for LLMs.

The point of this approach is your building blocks remain fairly atomic simple dumb things that even a 1b model can reliably handle - kinda like the guarantee of the RTL.

Then if you want to move from terraform to opentofu or whatever, who cares ... your stuff is in the llm metalanguage ... it's just a different compile target.

It's kinda like PHP. You just go along like normal and occasionally break form for the special metalanguage whenever your hit a point of contextual variance.

simonw · 12h ago
Wow, that library is looking really great!

I think I want a plugin hook that lets plugins take over the display of content by the tool.

Just filed an issue: https://github.com/simonw/llm/issues/1112

Would love to get your feedback on it, I included a few design options but none of them feel 100% right to me yet.

kristopolous · 12h ago
The real solution is semantic routing. You want to be able to define routing rules based on something like mdast (https://github.com/syntax-tree/mdast) . I've built a few hacked versions. This would not only allow for things like terminal rendering but is also a great complement to tool calling. Being able to siphon and multiplex inputs for the future where cerebras like speeds become more common, dynamic configurable stream routing will unlock quite a bit more use cases.

We have cost, latency, context window and model routing but I haven't seen anything semantic yet. Someone's going to do it, might as well be me.

rpeden · 12h ago
Neat! I've written streaming Markdown renderers in a couple of languages for quickly displaying streaming LLM output. Nice to see I'm not the only one! :)
kristopolous · 12h ago
It's a wildly nontrivial problem if you're trying to only be forward moving and want to minimize your buffer.

That's why everybody else either rerenders (such as rich) or relies on the whole buffer (such as glow).

I didn't write Streamdown for fun - there are genuinely no suitable tools that did what I needed.

Also various models have various ideas of what markdown should be and coding against CommonMark doesn't get you there.

Then there's other things. You have to check individual character width and the language family type to do proper word wrap. I've seen a number of interesting tmux and alacritty bugs in doing multi language support

The only real break I do is I render h6 (######) as muted grey.

Compare:

    for i in $(seq 1 6); do 
      printf "%${i}sh${i}\n\n-----\n" | tr " " "#"; 
    done | pv -bqL 30 | sd -w 30
to swapping out `sd` with `glow`. You'll see glow's lag - waiting for that EOF is annoying.

Also try sd -b 0.4 or even -b 0.7,0.8,0.8 for a nice blue. It's a bit easier to configure than the usual catalog of themes that requires a compilation after modification like with pygments.

icarito · 7h ago
That's right this is a nontrivial problem that I struggled with too for gtk-llm-chat! I resolved it using the streaming markdown-it-py library.
kristopolous · 7h ago
Huh this might be another approach with a bit of effort. Thanks for that. I didn't know about this
hanatanaka1984 · 11h ago
Interesting, I will be sure to check into this. I have been using llm and bat with syntax highlighting.
kristopolous · 11h ago
Do you just do

| bat --language=markdown --force-colorization ?

hanatanaka1984 · 10h ago
A simple bash script provides quick command line access to the tool. Output is paged syntax highlighted markdown.

  echo "$@" | llm "Provide a brief response to the question, if the question is related to command provide the command and short description" | bat --plain -l md
Lauch as:

  llmquick "why is the sky blue?"
kristopolous · 10h ago
I've got a nice tool as well

https://github.com/day50-dev/llmehelp/blob/main/Snoopers/wtf

I've thought about redoing it because my needs are things like

   $ ls | wtf which endpoints do these things talk to, give me a map and line numbers. 
What this will eventually be is "ai-grep" built transparently on https://ast-grep.github.io/ where the llm writes the complicated query (these coding agents all seem to use ripgrep but this works better)

Conceptual grep is what I've wanted my while life

Semantic routing, which I alluded to above, could get this to work progressively so you quickly get adequate results which then pareto their way up as the token count increases.

Really you'd like some tampering, like a coreutils timeout(1) but for simplex optimization.

johnisgood · 5h ago
> DO NOT include the file name. Again, DO NOT INCLUDE THE FILE NAME.

Lmao. Does it work? I hate that it needs to be repeated (in general). ChatGPT could not care less to follow my instructions, through the API it probably would?

hanatanaka1984 · 10h ago
| bat -p -l md

simple and works well.

nbbaier · 10h ago
Ohh I've wanted this so much! Thank you!
tantalor · 13h ago
This greatly opens up the risk of footguns.

The doc [1] warns about prompt injection, but I think a more likely scenario is self-inflicted harm. For instance, you give a tool access to your brokerage account to automate trading. Even without prompt injection, there's nothing preventing the bot from making stupid trades.

[1] https://llm.datasette.io/en/stable/tools.html

simonw · 13h ago
> This greatly opens up the risk of footguns.

Yeah, it really does.

There are so many ways things can go wrong once you start plugging tools into an LLM, especially if those tool calls are authenticated and can take actions on your behalf.

The MCP world is speed-running this right now, see the GitHub MCP story from yesterday: https://news.ycombinator.com/item?id=44097390

I stuck a big warning in the documentation and I've been careful not to release any initial tool plugins that can cause any damage - hence my QuickJS sandbox one and SQLite plugin being read-only - but it's a dangerous space to be exploring.

(Super fun and fascinating though.)

kbelder · 12h ago
If you hook an llm up to your brokerage account, someone is being stupid, but it ain't the bot.
isaacremuant · 12h ago
You think "senior leadership/boards of directors" aren't thinking of going all in with AI to "save money" and "grow faster and cheaper"?

This is absolutely going to happen at a large scale and then we'll have "cautionary tales" and a lot of "compliance" rules.

zaik · 9h ago
Let it happen. Just don't bail them out using tax money again.
mike_hearn · 1h ago
Yes, sandboxing will be crucial. On macOS it's not that hard, but there aren't good easy to use tools available for it right now. Claude Code has started using Seatbelt a bit to optimize the UX.
arendtio · 45m ago
I think the whole footgun discussion misses the point. Yes, you can shoot yourself in the foot (and probably will), but not evaluating the possibilities is also a risk. Regular people tend to underestimate the footgun potential (probably driven by fear of missing out) and technical people tend to underestimate the risk of not learning the new possibilities.

Even a year ago I let LLMs execute local commands on my laptop. I think it is somewhat risky, but nothing harmful happened. You also have to consider what you are prompting. So when I prompt 'find out where I am and what weather it is going to be', it is possible that it will execute rm -rf / but very unlikely.

However, speaking of letting an LLMs trade stocks without understanding how the LLM will come to a decision... too risky for my taste ;-)

abc-1 · 13h ago
[flagged]
dang · 9h ago
Could you please stop posting shallow dismissals and putdowns of other people and their work? It's against the site guidelines, and your account has unfortunately been doing a lot of it:

https://news.ycombinator.com/item?id=44073456

https://news.ycombinator.com/item?id=44073413

https://news.ycombinator.com/item?id=44070923

https://news.ycombinator.com/item?id=44070514

https://news.ycombinator.com/item?id=44010921

https://news.ycombinator.com/item?id=43970274

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

No comments yet

tantalor · 12h ago
People are already doing that!
ekianjo · 12h ago
Natural selection at play
shepherdjerred · 10h ago
Any tool can be misused
yard2010 · 5h ago
This is not misuse. This is equivalent to a driller that in some cases drills the hand holding it.
tantalor · 9h ago
You're missing the point. Most tools are deployed by humans. If they do something bad, we can blame the human for using the tool badly. And we can predict when a bad choice by the human operator will lead to a bad outcome.

Letting the LLM run the tool unsupervised is another thing entirely. We do not understand the choices the machines are making. They are unpredictable and you can't root-cause their decisions.

LLM tool use is a new thing we haven't had before, which means tool misuse is a whole new class of FUBAR waiting to happen.

johnisgood · 5h ago
But why can we not hold humans responsible in the case of LLM? You do have to go out of your way to do all of these things with an LLM. And it is the human that does it. It is the humans that give it the permission to act on their behalf. We can definitely hold humans responsible. The question is: are we going to?
rahimnathwani · 2h ago
Thanks for making this. I used it (0.26a0) last week to create a demo for a customer-facing chatbot using proprietary data.

The key elements I had to write:

- The system prompt

- Tools to pull external data

- Tools to do some calculations

Your library made the core functionality very easy.

Most of the effort for the demo was to get the plumbing working (a nice-looking web UI for the chatbot that would persist the conversation, update nicely if the user refreshed their browser due to a connection issue, and allow the user to start a new chat session).

I didn't know about `after_call=print`. So I'm glad I read this blog post!

nlh · 11h ago
Ok this is great and perfect timing -- I've been playing around with Warp (the terminal) and while I love the idea of their terminal-based "agent" (eg tool loop), I don't love the whole Cursor-esque model of "trust us we'll make a good prompt and LLM calls for you" (and charge you for it), so I was hoping for a simple CLI-based terminal agent just to solve for my lack of shell-fu.

I am keenly aware this is a major footgun here, but it seems that a terminal tool + llm would be a perfect lightweight solution.

Is there a way to have llm get permission for each tool call the way other "agents" do this? ("llm would like to call `rm -rf ./*` press Y to confirm...")

Would be a decent way to prevent letting an llm run wild on my terminal and still provide some measure of protection.

andresnds · 11h ago
Isn’t that how the default way codex CLI runs? I.e. without passing —full-auto
icarito · 7h ago
For all of you using `llm` - perhaps take a look at [Gtk-llm-chat](https://github.com/icarito/gtk-llm-chat).

I put a lot of effort into it - it integrates with `llm` command line tool and with your desktop, via a tray icon and nice chat window.

I recently released 3.0.0 with packages for all three major desktop operating systems.

kristopolous · 7h ago
Interesting. What do you use it for beyond the normal chatting
icarito · 7h ago
I sometimes use llm from the command line, for instance with a fragment, or piping a resource from the web with curl, and then pick up the cid with `llm gtk-chat --cid MYCID`.
kristopolous · 5h ago
I'm actually planning on abandoning Simon's infra soon. I want a multi-stream, routing based solution that is more aware of the modern API advancements.

The Unix shell is good at being the glue between programs. We've increased the dimensionality with LLMs.

Some kind of ports based system like named pipes with consumers and producers.

Maybe something like gRPC or NATS (https://github.com/nats-io). MQTT might also work. Network transparent would be great.

prettyblocks · 9h ago
I've been trying to maintain a (mostly vibe-coded) zsh/omz plugin for tab completions on your LLM cli and the rate at which you release new features makes it tough to keep up!

Fortunately this gets me 90% of the way there:

llm -f README.md -f llm.plugin.zsh -f completions/_llm -f https://simonwillison.net/2025/May/27/llm-tools/ "implement tab completions for the new tool plugins feature"

My repo is here:

https://github.com/eliyastein/llm-zsh-plugin

And again, it's a bit of a mess, because I'm trying to get as many options and their flags as I can. I wouldn't mind if anyone has any feedback for me.

sillysaurusx · 9h ago
Kind of crazy this isn’t sci-fi, it’s just how coding is done now. Future generations are going to wonder how we ever got anything done, the same way we wonder how assembly programmers managed to.
kristopolous · 7h ago
it makes simple things easy but hard things impossible. We'll see.
xk_id · 8h ago
The transition from assembly to C was to a different layer of abstraction within the same context of deterministic computation. The transition from programming to LLM prompting is to a qualitatively different context, because the process is no longer deterministic, nor debuggable. So your analogy fails to apply in a meaningful way to this situation.
pollinations · 3h ago
Why isn't it debuggable?
aitchnyu · 1h ago
Do local multimodal llms have low latency? If it can plug into monitor video stream and the accessibility representation of UI (like Shortcat), it could answer if a printer is connected to the computer and preview a full page print and wait for us to hit print.

https://shortcat.app/

never_inline · 1h ago
Thanks for writing the LLM CLI.

It's been a very useful tool to test out and prototype using various LLM features like multimodal, schema output and now tools as well! I specifically like that I can just write a python function with type annotations and plug it to the LLM.

chrissoundz · 1h ago
I think the project should really be given a name other than 'llm'. Not something that can be easily found or identified otherwise.
oliviergg · 14h ago
Thank you for this release. I believe your library is a key component to unlocking the potential of LLMs without the limitations/restricitions of existing clients.

Since you released version 0.26 alpha, I’ve been trying to create a plugin to interact with a some MCP server, but it’s a bit too challenging for me. So far, I’ve managed to connect and dynamically retrieve and use tools, but I’m not yet able to pass parameters.

simonw · 14h ago
Yeah I had a bit of an experiment with MCP this morning, to see if I could get a quick plugin demo out for it. It's a bit tricky! The official mcp Python library really wants you to run asyncio and connect to the server and introspect the available tools.
ttul · 13h ago
GPT-4.1 is a capable model, especially for structured outputs and tool calling. I’ve been using LLMs for my day to day grunt work for two years now and this is my goto as a great combination of cheap and capable.
simonw · 13h ago
I'm honestly really impressed with GPT-4.1 mini. It is my default from messing around by their API because it is unbelievably inexpensive and genuinely capable at most of the things I throw at it.

I'll switch to o4-mini when I'm writing code, but otherwise 4.1-mini usually does a great job.

Fun example from earlier today:

  llm -f https://raw.githubusercontent.com/BenjaminAster/CSS-Minecraft/refs/heads/main/main.css \
    -s 'explain all the tricks used by this CSS'
That's piping the CSS from that incredible CSS Minecraft demo - https://news.ycombinator.com/item?id=44100148 - into GPT-4.1 mini and asking it for an explanation.

The code is clearly written but entirely uncommented: https://github.com/BenjaminAster/CSS-Minecraft/blob/main/mai...

GPT-4.1 mini's explanation is genuinely excellent: https://gist.github.com/simonw/cafd612b3982e3ad463788dd50287... - it correctly identifies "This CSS uses modern CSS features at an expert level to create a 3D interactive voxel-style UI while minimizing or eliminating JavaScript" and explains a bunch of tricks I hadn't figured out.

And it used 3,813 input tokens and 1,291 output tokens - https://www.llm-prices.com/#it=3813&ot=1291&ic=0.4&oc=1.6 - that's 0.3591 cents (around a third of a cent).

puttycat · 13h ago
> while minimizing or eliminating JavaScript

How come it doesn't know for sure?

simonw · 12h ago
Because I only showed it the CSS! It doesn't even get the HTML, it's guessed all of that exclusively from what's in the (uncommented) CSS code.

Though it's worth noting that CSS Minecraft was first released three years ago, so there's a chance it has hints about it in the training data already. This is not a meticulous experiment.

(I've had a search around though and the most detailed explanation I could find of how that code works is the one I posted on my blog yesterday - my hunch is that it figured it out from the code alone.)

puttycat · 12h ago
Thanks. I meant that it should understand that the css doesn't require/relates to a js
samuel · 7h ago
This is great, and a feature I was hoping for, for a long time.

At this point I would have expected something mcp or openapi based but probably is simpler and more flexible this way. Implementing it as plugin shouldn't be hard, I think.

hanatanaka1984 · 11h ago
Great work Simon! I use your tool daily. Pipes and easy model switching for local (ollama) and remote makes this very easy to work with.
roxolotl · 11h ago
Simon thank you so much for this tool! I use it daily now since charmbraclet’s Mods[0] doesn’t support anthropics models. And now with tool calling it’ll be even more useful. I am curious though if there’s any appetite for improving performance? It’s noticeably slow to even just print the help on all of my machines(M1 32gb/M2 21g/ryzen 7700 64gb).

0: https://github.com/charmbracelet/mods

simonw · 11h ago
How many plugins do you have installed?

We've seen problems in the past where plugins with expensive imports (like torch) slow everything down a lot: https://github.com/simonw/llm/issues/949

I'm interested in tracking down the worst offenders and encouraging them to move to lazy imports instead.

roxolotl · 10h ago
I’ve only got the Anthropic and Gemini plugins installed. I’d be happy to do a bit more digging. I’m away for a bit but would be happy to file an issue with more context when I get a chance.
simonw · 10h ago
Try running this and see if anything interesting comes out of it:

  sudo uvx py-spy record -o /tmp/profile.svg -- llm --help
lynx97 · 6h ago
This is great! AIUI, llama.cpp does support tools, but I haven't figured out yet what to do to make llm use it. Is there anything I can put into extra-openai-models.yaml to make this work?
simonw · 6h ago
That's likely a change that needs to be made to either https://github.com/simonw/llm-gguf or https://github.com/simonw/llm-llama-server

... OK, I got the second one working!

  brew install llama.cpp
  llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL
Then in another window:

  llm install llm-llama-server
  llm -m llama-server-tools -T llm_time 'what time is it?' --td
Wrote it up here: https://simonwillison.net/2025/May/28/llama-server-tools/
johnisgood · 5h ago
Simon, at this point you have a lot of LLM-related tools, and I am not sure which one is outdated, which one is the newest, fanciest stuff, which one is the one that one should use (and when), and so forth.

Is there a blog post / article that addresses this?

simonw · 4h ago
For my stuff it's basically just the latest https://llm.datasette.io and its collection of plugins.

If you're interested in what I recommend generally that changes a lot, but my most recent piece about that is here: https://simonwillison.net/2025/May/15/building-on-llms/

lynx97 · 2h ago
Great, this works here. I wonder, with extra-openai-models.yaml I was able to set the api_base and vision/audio: true. How do I do this with the llama-server-tools plugin? vision works, but llm refuses to attach audio because it thinks the model does not support audio /which it does).

EDIT: I think I just found what I want. There is no need for the plugin, extra-openai-models.yaml just needs "supports_tools: true" and "can_stream: false".

sorenjan · 14h ago
Every time I update llm I have to reinstall all plugins, like gemini and ollama. My Gemini key is still saved, as are my aliases for my ollama models, so I don't get why the installed plugins are lost.
simonw · 14h ago
Sorry about that! Presumably you're updating via Homebrew? That blows away your virtual environment, hence why the plugins all go missing.

I have an idea to fix that by writing a 'plugins.txt' file somewhere with all of your installed plugins and then re-installing any that go missing - issue for that is here: https://github.com/simonw/llm/issues/575

sorenjan · 14h ago
No, I'm using uv tool just like in that issue. I'll keep an eye on it, at least I know it's not just me.
tionis · 13h ago
I'm also using uv tools and fixed it by doing something like this to upgrade:

uv tool install llm --upgrade --upgrade --with llm-openrouter --with llm-cmd ...

johnisgood · 5h ago
Is the double "--upgrade" a typo?
tionis · 5h ago
Yes, autocorrect on my phone worked against me
varyherb · 11h ago
I was running into this too until I started upgrading with

llm install -U llm

instead of

uv tool upgrade llm

(the latter of which is recommended by simonw in the original post)

simonw · 10h ago
Thanks! I didn't realize "llm install -U llm" did that. I'll add that to the upgrade docs.
swyx · 13h ago
nice one simon - i'm guessing this is mildly related to your observation that everyone is converging on the same set of tools? https://x.com/simonw/status/1927378768873550310
simonw · 13h ago
Actually a total coincidence! I have been trying to ship this for weeks.
tiniuclx · 3h ago
Thanks for the work you put into this, Simon. I've had a great experience with LLM, it's amazing for quickly iterating on AI application ideas.
pawanjswal · 6h ago
I think, this LLM 0.26 just turned every terminal into a playground for AI-powered tools.
a_bonobo · 6h ago
How does this differ from langchain's tool calling?
simonw · 4h ago
Dunno, I haven't used LangChain very much. My guess is that LLM is simpler to use!
behnamoh · 15h ago
what are the use cases for llm, the CLI tool? I keep finding tgpt or the bulletin AI features of iTerm2 sufficient for quick shell scripting. does llm have any special features that others don't? am I missing something?
simonw · 14h ago
I find it extremely useful as a research tool. It can talk to probably over 100 models at this point, providing a single interface to all of them and logging full details of prompts and responses to its SQLite database. This makes it fantastic for recording experiments with different models over time.

The ability to pipe files and other program outputs into an LLM is wildly useful. A few examples:

  llm -f code.py -s 'Add type hints' > code_typed.py
  git diff | llm -s 'write a commit message'
  llm -f https://raw.githubusercontent.com/BenjaminAster/CSS-Minecraft/refs/heads/main/main.css \
    -s 'explain all the tricks used by this CSS'
It can process images too! https://simonwillison.net/2024/Oct/29/llm-multi-modal/

  llm 'describe this photo' -a path/to/photo.jpg
LLM plugins can be a lot of fun. One of my favorites is llm-cmd which adds the ability to do things like this:

  llm install llm-cmd
  llm cmd ffmpeg convert video.mov to mp4
It proposes a command to run, you hit enter to run it. I use it for ffmpeg and similar tools all the time now. https://simonwillison.net/2024/Mar/26/llm-cmd/

I'm getting a whole lot of coding done with LLM now too. Here's how I wrote one of my recent plugins:

  llm -m openai/o3 \
    -f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
    -f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
    -s 'Write a new fragments plugin in Python that registers issue:org/repo/123 which fetches that issue
      number from the specified github repo and uses the same markdown logic as the HTML page to turn that into a fragment'
I wrote about that one here: https://simonwillison.net/2025/Apr/20/llm-fragments-github/

LLM was also used recently in that "How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation" story - to help automate running 100s of prompts: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...

setheron · 14h ago
Wow what a great overview; is there a big doc to see all these options? I'd love to try it -- I've been trying `gh` copilot pulgin but this looks more appealing.
simonw · 14h ago
I really need to put together a better tutorial - there's a TON of documentation but it's scattered across a bunch of different places:

- The official docs: https://llm.datasette.io/

- The workshop I gave at PyCon a few weeks ago: https://building-with-llms-pycon-2025.readthedocs.io/

- The "New releases of LLM" series on my blog: https://simonwillison.net/series/llm-releases/

- My "llm" tag, which has 195 posts now! https://simonwillison.net/tags/llm/

setheron · 13h ago
I use NixOS seems like this got me enough to get started (I wanted Gemini)

``` # AI cli (unstable.python3.withPackages ( ps: with ps; [ llm llm-gemini llm-cmd ] )) ```

looks like most of the plugins are models and most of the functionality you demo'd in the parent comment is baked into the tool itself.

Yea a live document might be cool -- part of the interesting bit was seeing "real" type of use cases you use it for .

Anyways will give it a spin.

th0ma5 · 14h ago
"LLM was used to find" is not what they did

> had I used o3 to find and fix the original vulnerability I would have, in theory [...]

they ran a scenario that they thought could have lead to finding it, which is pretty much not what you said. We don't know how much their foreshadowing crept into their LLM context, and even the article says it was also sort of chance. Please be more precise and don't give into these false beliefs of productivity. Not yet at least.

simonw · 14h ago
I said "LLM was also used recently in that..." which is entirely true. They used my LLM CLI tool as part of the work they described in that post.
th0ma5 · 10h ago
Very fair, I expect others to confuse what you mean productivity of your tool called LLM vs. the doubt that many have on the actually productivity of LLM the large language model concept.
furyofantares · 13h ago
I don't use llm, but I have my own "think" tool (with MUCH less support than llm, it just calls openai + some special prompt I have set) and what I use it for is when I need to call an llm from a script.

Most recently I wanted a script that could produce word lists from a dictionary of 180k words given a query, like "is this an animal?" The script breaks the dictionary up into chunks of size N (asking "which of these words is an animal? respond with just the list of words that match, or NONE if none, and nothing else"), makes M parallel "think" queries, and aggregates the results in an output text file.

I had Claude Code do it, and even though I'm _already_ talking to an LLM, it's not a task that I trust an LLM to do without breaking the word list up into much smaller chunks and making loads of requests.

cyanydeez · 12h ago
youre pnly a few steps away from creating a LLM synaptic network
furyofantares · 9h ago
I'm automating spending money at an exponential rate.
WhereIsTheTruth · 1h ago
pip, brew, pipx, uv

Can we stop already? stop following webdevs practices

never_inline · 1h ago
They are all compatible with existing stuff, which makes it kinda better than js.
stavros · 13h ago
Have you guys had luck with tool calls? I made a simple assistant with access to my calendar, and most models fail to call the tool to add calendar events. GPT-4.1 also regularly tries to gaslight me into believing that it added the event when it didn't call the tool!

Overall, I found tool use extremely hit-and-miss, to the point where I'm sure I'm doing something wrong (I'm using the OpenAI Agents SDK, FWIW).

simonw · 13h ago
I get the impression that the key to getting great performance out of tool calls is having a really detailed system prompt, with a bunch of examples.

Anthropic's system prompt just for their "web_search" tool is over 6,000 tokens long! https://simonwillison.net/2025/May/25/claude-4-system-prompt...

xrd · 12h ago
Is no one else bothered by that way of using tools? Tools feel like a way to get deterministic behavior from a very hallucinatory process. But unless you put a very lengthy and comprehensive non-deterministic English statement, you can't effectively use tools. As we all know, the more code, the more bugs. These long and often hidden prompts seem like the wrong way to go.

And, this is why I'm very excited about this addition to the llm tool, because it feels like it moves the tool closer to the user and reduces the likelihood of the problem I'm describing.

simonw · 12h ago
As an experienced software engineer I'm bothered about pretty much everything about how we develop things on top of LLMs! I can't even figure out how to write automated tests for them.

See also my multi-year obsession with prompt injection and LLM security, which still isn't close to being a solved problem: https://simonwillison.net/tags/prompt-injection/

Yet somehow I can't tear myself away from them. The fact that we can use computers to mostly understand human language (and vision problems as well) is irresistible to me.

131012 · 11h ago
This is exactly why I follow your work, this mix of critical thinking and enthusiasm. Please keep going!
xrd · 9h ago
You put it so well! I agree wholeheartedly. Llms are language toys we get to play and it's so much fun. But I'm bothered in the same way you are and that's fine.
th0ma5 · 10h ago
> The fact that we can use computers to mostly understand human language

I agree that'd be amazing if they do that, but they most certainly do not. I think this is the core my disagreement here that you believe this and let this guide you. They don't understand anything and are matching and synthesizing patterns. I can see how that's enthralling like watching a rube goldberg machine go through its paces, but there is no there there. The idea that there is an emergent something there is at best an unproven theory, is documented as being an illusion, and at worst has become an unfounded messianic belief.

simonw · 9h ago
That's why I said "mostly".

I know they're just statistical models, and that having conversations with them is like having a conversation with a stack of dice.

But if the simulation is good enough to be useful, the fact that they don't genuinely "understand" doesn't really matter to me.

I've had tens of thousands of "conversations" with these things now (I know because I log them all). Whether or not they understand anything they're still providing a ton of value back to me.

th0ma5 · 6h ago
I guess I respect that you're stating it honestly, but this is a statement of belief or faith. I think it is something that you should disclose perhaps more often because it doesn't stem from other first principles and is I guess actually just tautological. I guess this is also getting more precise with our fundamental disagreement, I guess I just wouldn't blog about things that are beliefs as if they are the technology itself?
tessellated · 43m ago
I don't need belief or faith to get use and entertainment out of the transformers. As Simon said, good enough.
stavros · 12h ago
That's really interesting, thanks Simon! I was kind of expecting the LLM to be trained already, I'll use Claude's prompt and see. Thanks again.
rat87 · 8h ago
Nice. I was just looking at how to copy the python version of the barebones ruby agent https://news.ycombinator.com/item?id=43984860. I found sw-llm and tried to find how to pass it tools but was having difficulty finding it in the docs
dr_kretyn · 14h ago
Maybe a small plug of own similar library: terminal-agent (https://github.com/laszukdawid/terminal-agent) which also supports tools and even MCP. There's a limited agentic capability but needs some polishing. Only once I made some progress on own app I learned about this `llm` CLI. Though that one more won't harm.

No comments yet

behnamoh · 15h ago
unrelated note: your blog is nice and I've been following you for a while, but as a quick suggestion: could you make the code blocks (inline or not) highlighted and more visible?
simonw · 14h ago
I have syntax highlighting for blocks of Python code - e.g. this one https://simonwillison.net/2025/May/27/llm-tools/#tools-in-th... - is that not visible enough?

This post has an unusually large number of code blocks without syntax highlighting since they're copy-pasted outputs from the debug tool which isn't in any formal syntax.