(Also check out https://github.com/day50-dev/llmehelp which features a tmux tool I built on top of Simon's llm. I use it every day. Really. It's become indispensable)
simonw · 4h ago
Wow, that library is looking really great!
I think I want a plugin hook that lets plugins take over the display of content by the tool.
Would love to get your feedback on it, I included a few design options but none of them feel 100% right to me yet.
kristopolous · 4h ago
The real solution is semantic routing. You want to be able to define routing rules based on something like mdast (https://github.com/syntax-tree/mdast) . I've built a few hacked versions. This would not only allow for things like terminal rendering but is also a great complement to tool calling. Being able to siphon and multiplex inputs for the future where cerebras like speeds become more common, dynamic configurable stream routing will unlock quite a bit more use cases.
We have cost, latency, context window and model routing but I haven't seen anything semantic yet. Someone's going to do it, might as well be me.
kristopolous · 2h ago
Also I forgot to one other built on llm.
This one is a ZSH plugin that uses zle to translate your English to shell commands with a keystroke.
It's been life changing for me. Here's one I wrote today:
$ git find out if abcdefg is a descendent of hijklmnop
In fact I used it in one of these comments
$ for i in $(seq 1 6); do
printf "%${i}sh${i}\n\n-----\n" | tr " " "#";
done | pv -bqL 30
Was originally
$ for i in $(seq 1 6); do
printf "(# $i times)\n\n-----\n"
done | pv (30 bps and quietly)
I did my trusty ctrl-x x and the buffer got sent off through openrouter and got swapped out with the proper syntax in under a second.
kazinator · 2h ago
The brace expansion syntax in Bash and Zsh expands integer ranges: {1..6}; no calling out to external command.
It's also intelligent about inferring leading zeros without needing to be told with options, e.g. {001..995}.
No comments yet
rpeden · 4h ago
Neat! I've written streaming Markdown renderers in a couple of languages for quickly displaying streaming LLM output. Nice to see I'm not the only one! :)
kristopolous · 4h ago
It's a wildly nontrivial problem if you're trying to only be forward moving and want to minimize your buffer.
That's why everybody else either rerenders (such as rich) or relies on the whole buffer (such as glow).
I didn't write Streamdown for fun - there are genuinely no suitable tools that did what I needed.
Also various models have various ideas of what markdown should be and coding against CommonMark doesn't get you there.
Then there's other things. You have to check individual character width and the language family type to do proper word wrap. I've seen a number of interesting tmux and alacritty bugs in doing multi language support
The only real break I do is I render h6 (######) as muted grey.
Compare:
for i in $(seq 1 6); do
printf "%${i}sh${i}\n\n-----\n" | tr " " "#";
done | pv -bqL 30 | sd -w 30
to swapping out `sd` with `glow`. You'll see glow's lag - waiting for that EOF is annoying.
Also try sd -b 0.4 or even -b 0.7,0.8,0.8 for a nice blue. It's a bit easier to configure than the usual catalog of themes that requires a compilation after modification like with pygments.
icarito · 16m ago
That's right this is a nontrivial problem that I struggled with too for gtk-llm-chat!
I resolved it using the streaming markdown-it-py library.
kristopolous · 2m ago
Huh this might be another approach with a bit of effort. Thanks for that. I didn't know about this
hanatanaka1984 · 3h ago
Interesting, I will be sure to check into this. I have been using llm and bat with syntax highlighting.
kristopolous · 3h ago
Do you just do
| bat --language=markdown --force-colorization ?
hanatanaka1984 · 3h ago
A simple bash script provides quick command line access to the tool. Output is paged syntax highlighted markdown.
echo "$@" | llm "Provide a brief response to the question, if the question is related to command provide the command and short description" | bat --plain -l md
I've thought about redoing it because my needs are things like
$ ls | wtf which endpoints do these things talk to, give me a map and line numbers.
What this will eventually be is "ai-grep" built transparently on https://ast-grep.github.io/ where the llm writes the complicated query (these coding agents all seem to use ripgrep but this works better)
Conceptual grep is what I've wanted my while life
Semantic routing, which I alluded to above, could get this to work progressively so you quickly get adequate results which then pareto their way up as the token count increases.
Really you'd like some tampering, like a coreutils timeout(1) but for simplex optimization.
I put a lot of effort into it - it integrates with `llm` command line tool and with your desktop, via a tray icon and nice chat window.
I recently released 3.0.0 with packages for all three major desktop operating systems.
kristopolous · 4m ago
Interesting. What do you use it for beyond the normal chatting
tantalor · 5h ago
This greatly opens up the risk of footguns.
The doc [1] warns about prompt injection, but I think a more likely scenario is self-inflicted harm. For instance, you give a tool access to your brokerage account to automate trading. Even without prompt injection, there's nothing preventing the bot from making stupid trades.
There are so many ways things can go wrong once you start plugging tools into an LLM, especially if those tool calls are authenticated and can take actions on your behalf.
I stuck a big warning in the documentation and I've been careful not to release any initial tool plugins that can cause any damage - hence my QuickJS sandbox one and SQLite plugin being read-only - but it's a dangerous space to be exploring.
(Super fun and fascinating though.)
kbelder · 4h ago
If you hook an llm up to your brokerage account, someone is being stupid, but it ain't the bot.
isaacremuant · 4h ago
You think "senior leadership/boards of directors" aren't thinking of going all in with AI to "save money" and "grow faster and cheaper"?
This is absolutely going to happen at a large scale and then we'll have "cautionary tales" and a lot of "compliance" rules.
zaik · 1h ago
Let it happen. Just don't bail them out using tax money again.
shepherdjerred · 2h ago
Any tool can be misused
tantalor · 2h ago
You're missing the point. Most tools are deployed by humans. If they do something bad, we can blame the human for using the tool badly. And we can predict when a bad choice by the human operator will lead to a bad outcome.
Letting the LLM run the tool unsupervised is another thing entirely. We do not understand the choices the machines are making. They are unpredictable and you can't root-cause their decisions.
LLM tool use is a new thing we haven't had before, which means tool misuse is a whole new class of FUBAR waiting to happen.
abc-1 · 5h ago
[flagged]
dang · 1h ago
Could you please stop posting shallow dismissals and putdowns of other people and their work? It's against the site guidelines, and your account has unfortunately been doing a lot of it:
Ok this is great and perfect timing -- I've been playing around with Warp (the terminal) and while I love the idea of their terminal-based "agent" (eg tool loop), I don't love the whole Cursor-esque model of "trust us we'll make a good prompt and LLM calls for you" (and charge you for it), so I was hoping for a simple CLI-based terminal agent just to solve for my lack of shell-fu.
I am keenly aware this is a major footgun here, but it seems that a terminal tool + llm would be a perfect lightweight solution.
Is there a way to have llm get permission for each tool call the way other "agents" do this? ("llm would like to call `rm -rf ./*` press Y to confirm...")
Would be a decent way to prevent letting an llm run wild on my terminal and still provide some measure of protection.
andresnds · 3h ago
Isn’t that how the default way codex CLI runs? I.e. without passing —full-auto
prettyblocks · 1h ago
I've been trying to maintain a (mostly vibe-coded) zsh/omz plugin for tab completions on your LLM cli and the rate at which you release new features makes it tough to keep up!
And again, it's a bit of a mess, because I'm trying to get as many options and their flags as I can. I wouldn't mind if anyone has any feedback for me.
sillysaurusx · 1h ago
Kind of crazy this isn’t sci-fi, it’s just how coding is done now. Future generations are going to wonder how we ever got anything done, the same way we wonder how assembly programmers managed to.
xk_id · 28m ago
The transition from assembly to C was to a different layer of abstraction within the same context of deterministic computation. The transition from programming to LLM prompting is to a qualitatively different context, because the process is no longer deterministic, nor debuggable. So your analogy fails to apply in a meaningful way to this situation.
hanatanaka1984 · 3h ago
Great work Simon! I use your tool daily. Pipes and easy model switching for local (ollama) and remote makes this very easy to work with.
ttul · 5h ago
GPT-4.1 is a capable model, especially for structured outputs and tool calling. I’ve been using LLMs for my day to day grunt work for two years now and this is my goto as a great combination of cheap and capable.
simonw · 5h ago
I'm honestly really impressed with GPT-4.1 mini. It is my default from messing around by their API because it is unbelievably inexpensive and genuinely capable at most of the things I throw at it.
I'll switch to o4-mini when I'm writing code, but otherwise 4.1-mini usually does a great job.
Fun example from earlier today:
llm -f https://raw.githubusercontent.com/BenjaminAster/CSS-Minecraft/refs/heads/main/main.css \
-s 'explain all the tricks used by this CSS'
GPT-4.1 mini's explanation is genuinely excellent: https://gist.github.com/simonw/cafd612b3982e3ad463788dd50287... - it correctly identifies "This CSS uses modern CSS features at an expert level to create a 3D interactive voxel-style UI while minimizing or eliminating JavaScript" and explains a bunch of tricks I hadn't figured out.
Because I only showed it the CSS! It doesn't even get the HTML, it's guessed all of that exclusively from what's in the (uncommented) CSS code.
Though it's worth noting that CSS Minecraft was first released three years ago, so there's a chance it has hints about it in the training data already. This is not a meticulous experiment.
(I've had a search around though and the most detailed explanation I could find of how that code works is the one I posted on my blog yesterday - my hunch is that it figured it out from the code alone.)
puttycat · 5h ago
Thanks. I meant that it should understand that the css doesn't require/relates to a js
roxolotl · 3h ago
Simon thank you so much for this tool! I use it daily now since charmbraclet’s Mods[0] doesn’t support anthropics models. And now with tool calling it’ll be even more useful. I am curious though if there’s any appetite for improving performance? It’s noticeably slow to even just print the help on all of my machines(M1 32gb/M2 21g/ryzen 7700 64gb).
I'm interested in tracking down the worst offenders and encouraging them to move to lazy imports instead.
roxolotl · 3h ago
I’ve only got the Anthropic and Gemini plugins installed. I’d be happy to do a bit more digging. I’m away for a bit but would be happy to file an issue with more context when I get a chance.
simonw · 2h ago
Try running this and see if anything interesting comes out of it:
sudo uvx py-spy record -o /tmp/profile.svg -- llm --help
oliviergg · 6h ago
Thank you for this release. I believe your library is a key component to unlocking the potential of LLMs without the limitations/restricitions of existing clients.
Since you released version 0.26 alpha, I’ve been trying to create a plugin to interact with a some MCP server, but it’s a bit too challenging for me. So far, I’ve managed to connect and dynamically retrieve and use tools, but I’m not yet able to pass parameters.
simonw · 6h ago
Yeah I had a bit of an experiment with MCP this morning, to see if I could get a quick plugin demo out for it. It's a bit tricky! The official mcp Python library really wants you to run asyncio and connect to the server and introspect the available tools.
sorenjan · 6h ago
Every time I update llm I have to reinstall all plugins, like gemini and ollama. My Gemini key is still saved, as are my aliases for my ollama models, so I don't get why the installed plugins are lost.
simonw · 6h ago
Sorry about that! Presumably you're updating via Homebrew? That blows away your virtual environment, hence why the plugins all go missing.
I have an idea to fix that by writing a 'plugins.txt' file somewhere with all of your installed plugins and then re-installing any that go missing - issue for that is here: https://github.com/simonw/llm/issues/575
sorenjan · 6h ago
No, I'm using uv tool just like in that issue. I'll keep an eye on it, at least I know it's not just me.
tionis · 6h ago
I'm also using uv tools and fixed it by doing something like this to upgrade:
Actually a total coincidence! I have been trying to ship this for weeks.
rat87 · 38m ago
Nice. I was just looking at how to copy the python version of the barebones ruby agent https://news.ycombinator.com/item?id=43984860. I found sw-llm and tried to find how to pass it tools but was having difficulty finding it in the docs
behnamoh · 7h ago
what are the use cases for llm, the CLI tool? I keep finding tgpt or the bulletin AI features of iTerm2 sufficient for quick shell scripting. does llm have any special features that others don't? am I missing something?
simonw · 6h ago
I find it extremely useful as a research tool. It can talk to probably over 100 models at this point, providing a single interface to all of them and logging full details of prompts and responses to its SQLite database. This makes it fantastic for recording experiments with different models over time.
The ability to pipe files and other program outputs into an LLM is wildly useful. A few examples:
llm -f code.py -s 'Add type hints' > code_typed.py
git diff | llm -s 'write a commit message'
llm -f https://raw.githubusercontent.com/BenjaminAster/CSS-Minecraft/refs/heads/main/main.css \
-s 'explain all the tricks used by this CSS'
I'm getting a whole lot of coding done with LLM now too. Here's how I wrote one of my recent plugins:
llm -m openai/o3 \
-f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
-f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
-s 'Write a new fragments plugin in Python that registers issue:org/repo/123 which fetches that issue
number from the specified github repo and uses the same markdown logic as the HTML page to turn that into a fragment'
LLM was also used recently in that "How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation" story - to help automate running 100s of prompts: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
setheron · 6h ago
Wow what a great overview; is there a big doc to see all these options?
I'd love to try it -- I've been trying `gh` copilot pulgin but this looks more appealing.
simonw · 6h ago
I really need to put together a better tutorial - there's a TON of documentation but it's scattered across a bunch of different places:
I use NixOS seems like this got me enough to get started (I wanted Gemini)
```
# AI cli
(unstable.python3.withPackages (
ps:
with ps; [
llm
llm-gemini
llm-cmd
]
))
```
looks like most of the plugins are models and most of the functionality you demo'd in the parent comment is baked into the tool itself.
Yea a live document might be cool -- part of the interesting bit was seeing "real" type of use cases you use it for .
Anyways will give it a spin.
th0ma5 · 6h ago
"LLM was used to find" is not what they did
> had I used o3 to find and fix the original vulnerability I would have, in theory [...]
they ran a scenario that they thought could have lead to finding it, which is pretty much not what you said. We don't know how much their foreshadowing crept into their LLM context, and even the article says it was also sort of chance. Please be more precise and don't give into these false beliefs of productivity. Not yet at least.
simonw · 6h ago
I said "LLM was also used recently in that..." which is entirely true. They used my LLM CLI tool as part of the work they described in that post.
th0ma5 · 2h ago
Very fair, I expect others to confuse what you mean productivity of your tool called LLM vs. the doubt that many have on the actually productivity of LLM the large language model concept.
furyofantares · 6h ago
I don't use llm, but I have my own "think" tool (with MUCH less support than llm, it just calls openai + some special prompt I have set) and what I use it for is when I need to call an llm from a script.
Most recently I wanted a script that could produce word lists from a dictionary of 180k words given a query, like "is this an animal?" The script breaks the dictionary up into chunks of size N (asking "which of these words is an animal? respond with just the list of words that match, or NONE if none, and nothing else"), makes M parallel "think" queries, and aggregates the results in an output text file.
I had Claude Code do it, and even though I'm _already_ talking to an LLM, it's not a task that I trust an LLM to do without breaking the word list up into much smaller chunks and making loads of requests.
cyanydeez · 4h ago
youre pnly a few steps away from creating a LLM synaptic network
furyofantares · 1h ago
I'm automating spending money at an exponential rate.
stavros · 5h ago
Have you guys had luck with tool calls? I made a simple assistant with access to my calendar, and most models fail to call the tool to add calendar events. GPT-4.1 also regularly tries to gaslight me into believing that it added the event when it didn't call the tool!
Overall, I found tool use extremely hit-and-miss, to the point where I'm sure I'm doing something wrong (I'm using the OpenAI Agents SDK, FWIW).
simonw · 5h ago
I get the impression that the key to getting great performance out of tool calls is having a really detailed system prompt, with a bunch of examples.
Is no one else bothered by that way of using tools? Tools feel like a way to get deterministic behavior from a very hallucinatory process. But unless you put a very lengthy and comprehensive non-deterministic English statement, you can't effectively use tools. As we all know, the more code, the more bugs. These long and often hidden prompts seem like the wrong way to go.
And, this is why I'm very excited about this addition to the llm tool, because it feels like it moves the tool closer to the user and reduces the likelihood of the problem I'm describing.
simonw · 4h ago
As an experienced software engineer I'm bothered about pretty much everything about how we develop things on top of LLMs! I can't even figure out how to write automated tests for them.
Yet somehow I can't tear myself away from them. The fact that we can use computers to mostly understand human language (and vision problems as well) is irresistible to me.
xrd · 1h ago
You put it so well! I agree wholeheartedly. Llms are language toys we get to play and it's so much fun. But I'm bothered in the same way you are and that's fine.
131012 · 4h ago
This is exactly why I follow your work, this mix of critical thinking and enthusiasm. Please keep going!
th0ma5 · 3h ago
> The fact that we can use computers to mostly understand human language
I agree that'd be amazing if they do that, but they most certainly do not. I think this is the core my disagreement here that you believe this and let this guide you. They don't understand anything and are matching and synthesizing patterns. I can see how that's enthralling like watching a rube goldberg machine go through its paces, but there is no there there. The idea that there is an emergent something there is at best an unproven theory, is documented as being an illusion, and at worst has become an unfounded messianic belief.
simonw · 1h ago
That's why I said "mostly".
I know they're just statistical models, and that having conversations with them is like having a conversation with a stack of dice.
But if the simulation is good
enough to be useful, the fact that they don't genuinely "understand" doesn't really matter to me.
I've had tens of thousands of "conversations" with these things now (I know because I log them all). Whether or not they understand anything they're still providing a ton of value back to me.
stavros · 4h ago
That's really interesting, thanks Simon! I was kind of expecting the LLM to be trained already, I'll use Claude's prompt and see. Thanks again.
dr_kretyn · 6h ago
Maybe a small plug of own similar library: terminal-agent (https://github.com/laszukdawid/terminal-agent) which also supports tools and even MCP. There's a limited agentic capability but needs some polishing.
Only once I made some progress on own app I learned about this `llm` CLI. Though that one more won't harm.
No comments yet
behnamoh · 7h ago
unrelated note: your blog is nice and I've been following you for a while, but as a quick suggestion: could you make the code blocks (inline or not) highlighted and more visible?
This post has an unusually large number of code blocks without syntax highlighting since they're copy-pasted outputs from the debug tool which isn't in any formal syntax.
bgwalter · 5h ago
Another tool, another blog post. But no serious project uses LLMs at all.
simonw · 5h ago
Are you an alt for th0ma5?
bgwalter · 5h ago
Are you paid by OpenAI?
simonw · 5h ago
No. If I was they'd probably be a bit angry about how much time I spend writing about their competitors:
They are not. I think it is a good criticism though. Many people seem to be touting productivity that is only in the context of productivity towards more LLM inference operations and not productive in the sense of solving real world computing problems. There is a lot of material that suggests positive results are a kind of wish casting and people are not aware of the agency they are bringing to the interaction. The fact that you can hack things that do cool stuff is more of a reflection that you do those things, and that these models are not capable of it. That's why I recommend you work with others and you'll see your concepts that you feel are generalizable are not, and any learnings or insights are not like learning math or how to read, but more like learning a specific video game's rules. This is also why it is enthralling to you because you actually have only the illusion of controlling it.
minimaxir · 2h ago
No one is happy about the need for prompt engineering and other LLM hacks, but part of being a professional in the real world is doing what works.
bgwalter · 2h ago
Software worked quite well before the plagiarizing chat bots.
This is just a new fad like agile, where the process and tool obsessed developers blog and preach endlessly without delivering any results.
For them "it works". The rest of us can either join the latest madness or avoid places that use LLMs, which of course includes all open source projects that do actual work that is not measured in GitHub kLOCs and marketing speak.
zer00eyz · 1h ago
The hype: "LLM's will replace all of the coders".
The hate: "LLM's can't do anything without hand holding"
I think both of these takes a disingenuous.
> not productive in the sense of solving real world computing problems.
Solving problems is a pretty vague term.
> The fact that you can hack things that do cool stuff
Lots of times this is how problems actually get solved. I would argue most of the time this is how problems get solved. More so if your working with other peoples software because your not reinventing CUPS, Dashboards, VPN's, Marketing Tools, Mail clients, Chat Clients and so on... I would argue that LOTS of good software is propped up directly and indirectly by this sort of hacking.
More background: https://github.com/simonw/llm/issues/12
(Also check out https://github.com/day50-dev/llmehelp which features a tmux tool I built on top of Simon's llm. I use it every day. Really. It's become indispensable)
I think I want a plugin hook that lets plugins take over the display of content by the tool.
Just filed an issue: https://github.com/simonw/llm/issues/1112
Would love to get your feedback on it, I included a few design options but none of them feel 100% right to me yet.
We have cost, latency, context window and model routing but I haven't seen anything semantic yet. Someone's going to do it, might as well be me.
This one is a ZSH plugin that uses zle to translate your English to shell commands with a keystroke.
https://github.com/day50-dev/Zummoner
It's been life changing for me. Here's one I wrote today:
In fact I used it in one of these comments Was originally I did my trusty ctrl-x x and the buffer got sent off through openrouter and got swapped out with the proper syntax in under a second.It's also intelligent about inferring leading zeros without needing to be told with options, e.g. {001..995}.
No comments yet
That's why everybody else either rerenders (such as rich) or relies on the whole buffer (such as glow).
I didn't write Streamdown for fun - there are genuinely no suitable tools that did what I needed.
Also various models have various ideas of what markdown should be and coding against CommonMark doesn't get you there.
Then there's other things. You have to check individual character width and the language family type to do proper word wrap. I've seen a number of interesting tmux and alacritty bugs in doing multi language support
The only real break I do is I render h6 (######) as muted grey.
Compare:
to swapping out `sd` with `glow`. You'll see glow's lag - waiting for that EOF is annoying.Also try sd -b 0.4 or even -b 0.7,0.8,0.8 for a nice blue. It's a bit easier to configure than the usual catalog of themes that requires a compilation after modification like with pygments.
| bat --language=markdown --force-colorization ?
https://github.com/day50-dev/llmehelp/blob/main/Snoopers/wtf
I've thought about redoing it because my needs are things like
What this will eventually be is "ai-grep" built transparently on https://ast-grep.github.io/ where the llm writes the complicated query (these coding agents all seem to use ripgrep but this works better)Conceptual grep is what I've wanted my while life
Semantic routing, which I alluded to above, could get this to work progressively so you quickly get adequate results which then pareto their way up as the token count increases.
Really you'd like some tampering, like a coreutils timeout(1) but for simplex optimization.
simple and works well.
I put a lot of effort into it - it integrates with `llm` command line tool and with your desktop, via a tray icon and nice chat window.
I recently released 3.0.0 with packages for all three major desktop operating systems.
The doc [1] warns about prompt injection, but I think a more likely scenario is self-inflicted harm. For instance, you give a tool access to your brokerage account to automate trading. Even without prompt injection, there's nothing preventing the bot from making stupid trades.
[1] https://llm.datasette.io/en/stable/tools.html
Yeah, it really does.
There are so many ways things can go wrong once you start plugging tools into an LLM, especially if those tool calls are authenticated and can take actions on your behalf.
The MCP world is speed-running this right now, see the GitHub MCP story from yesterday: https://news.ycombinator.com/item?id=44097390
I stuck a big warning in the documentation and I've been careful not to release any initial tool plugins that can cause any damage - hence my QuickJS sandbox one and SQLite plugin being read-only - but it's a dangerous space to be exploring.
(Super fun and fascinating though.)
This is absolutely going to happen at a large scale and then we'll have "cautionary tales" and a lot of "compliance" rules.
Letting the LLM run the tool unsupervised is another thing entirely. We do not understand the choices the machines are making. They are unpredictable and you can't root-cause their decisions.
LLM tool use is a new thing we haven't had before, which means tool misuse is a whole new class of FUBAR waiting to happen.
https://news.ycombinator.com/item?id=44073456
https://news.ycombinator.com/item?id=44073413
https://news.ycombinator.com/item?id=44070923
https://news.ycombinator.com/item?id=44070514
https://news.ycombinator.com/item?id=44010921
https://news.ycombinator.com/item?id=43970274
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
No comments yet
I am keenly aware this is a major footgun here, but it seems that a terminal tool + llm would be a perfect lightweight solution.
Is there a way to have llm get permission for each tool call the way other "agents" do this? ("llm would like to call `rm -rf ./*` press Y to confirm...")
Would be a decent way to prevent letting an llm run wild on my terminal and still provide some measure of protection.
Fortunately this gets me 90% of the way there:
llm -f README.md -f llm.plugin.zsh -f completions/_llm -f https://simonwillison.net/2025/May/27/llm-tools/ "implement tab completions for the new tool plugins feature"
My repo is here:
https://github.com/eliyastein/llm-zsh-plugin
And again, it's a bit of a mess, because I'm trying to get as many options and their flags as I can. I wouldn't mind if anyone has any feedback for me.
I'll switch to o4-mini when I'm writing code, but otherwise 4.1-mini usually does a great job.
Fun example from earlier today:
That's piping the CSS from that incredible CSS Minecraft demo - https://news.ycombinator.com/item?id=44100148 - into GPT-4.1 mini and asking it for an explanation.The code is clearly written but entirely uncommented: https://github.com/BenjaminAster/CSS-Minecraft/blob/main/mai...
GPT-4.1 mini's explanation is genuinely excellent: https://gist.github.com/simonw/cafd612b3982e3ad463788dd50287... - it correctly identifies "This CSS uses modern CSS features at an expert level to create a 3D interactive voxel-style UI while minimizing or eliminating JavaScript" and explains a bunch of tricks I hadn't figured out.
And it used 3,813 input tokens and 1,291 output tokens - https://www.llm-prices.com/#it=3813&ot=1291&ic=0.4&oc=1.6 - that's 0.3591 cents (around a third of a cent).
How come it doesn't know for sure?
Though it's worth noting that CSS Minecraft was first released three years ago, so there's a chance it has hints about it in the training data already. This is not a meticulous experiment.
(I've had a search around though and the most detailed explanation I could find of how that code works is the one I posted on my blog yesterday - my hunch is that it figured it out from the code alone.)
0: https://github.com/charmbracelet/mods
We've seen problems in the past where plugins with expensive imports (like torch) slow everything down a lot: https://github.com/simonw/llm/issues/949
I'm interested in tracking down the worst offenders and encouraging them to move to lazy imports instead.
Since you released version 0.26 alpha, I’ve been trying to create a plugin to interact with a some MCP server, but it’s a bit too challenging for me. So far, I’ve managed to connect and dynamically retrieve and use tools, but I’m not yet able to pass parameters.
I have an idea to fix that by writing a 'plugins.txt' file somewhere with all of your installed plugins and then re-installing any that go missing - issue for that is here: https://github.com/simonw/llm/issues/575
uv tool install llm --upgrade --upgrade --with llm-openrouter --with llm-cmd ...
llm install -U llm
instead of
uv tool upgrade llm
(the latter of which is recommended by simonw in the original post)
The ability to pipe files and other program outputs into an LLM is wildly useful. A few examples:
It can process images too! https://simonwillison.net/2024/Oct/29/llm-multi-modal/ LLM plugins can be a lot of fun. One of my favorites is llm-cmd which adds the ability to do things like this: It proposes a command to run, you hit enter to run it. I use it for ffmpeg and similar tools all the time now. https://simonwillison.net/2024/Mar/26/llm-cmd/I'm getting a whole lot of coding done with LLM now too. Here's how I wrote one of my recent plugins:
I wrote about that one here: https://simonwillison.net/2025/Apr/20/llm-fragments-github/LLM was also used recently in that "How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation" story - to help automate running 100s of prompts: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
- The official docs: https://llm.datasette.io/
- The workshop I gave at PyCon a few weeks ago: https://building-with-llms-pycon-2025.readthedocs.io/
- The "New releases of LLM" series on my blog: https://simonwillison.net/series/llm-releases/
- My "llm" tag, which has 195 posts now! https://simonwillison.net/tags/llm/
``` # AI cli (unstable.python3.withPackages ( ps: with ps; [ llm llm-gemini llm-cmd ] )) ```
looks like most of the plugins are models and most of the functionality you demo'd in the parent comment is baked into the tool itself.
Yea a live document might be cool -- part of the interesting bit was seeing "real" type of use cases you use it for .
Anyways will give it a spin.
> had I used o3 to find and fix the original vulnerability I would have, in theory [...]
they ran a scenario that they thought could have lead to finding it, which is pretty much not what you said. We don't know how much their foreshadowing crept into their LLM context, and even the article says it was also sort of chance. Please be more precise and don't give into these false beliefs of productivity. Not yet at least.
Most recently I wanted a script that could produce word lists from a dictionary of 180k words given a query, like "is this an animal?" The script breaks the dictionary up into chunks of size N (asking "which of these words is an animal? respond with just the list of words that match, or NONE if none, and nothing else"), makes M parallel "think" queries, and aggregates the results in an output text file.
I had Claude Code do it, and even though I'm _already_ talking to an LLM, it's not a task that I trust an LLM to do without breaking the word list up into much smaller chunks and making loads of requests.
Overall, I found tool use extremely hit-and-miss, to the point where I'm sure I'm doing something wrong (I'm using the OpenAI Agents SDK, FWIW).
Anthropic's system prompt just for their "web_search" tool is over 6,000 tokens long! https://simonwillison.net/2025/May/25/claude-4-system-prompt...
And, this is why I'm very excited about this addition to the llm tool, because it feels like it moves the tool closer to the user and reduces the likelihood of the problem I'm describing.
See also my multi-year obsession with prompt injection and LLM security, which still isn't close to being a solved problem: https://simonwillison.net/tags/prompt-injection/
Yet somehow I can't tear myself away from them. The fact that we can use computers to mostly understand human language (and vision problems as well) is irresistible to me.
I agree that'd be amazing if they do that, but they most certainly do not. I think this is the core my disagreement here that you believe this and let this guide you. They don't understand anything and are matching and synthesizing patterns. I can see how that's enthralling like watching a rube goldberg machine go through its paces, but there is no there there. The idea that there is an emergent something there is at best an unproven theory, is documented as being an illusion, and at worst has become an unfounded messianic belief.
I know they're just statistical models, and that having conversations with them is like having a conversation with a stack of dice.
But if the simulation is good enough to be useful, the fact that they don't genuinely "understand" doesn't really matter to me.
I've had tens of thousands of "conversations" with these things now (I know because I log them all). Whether or not they understand anything they're still providing a ton of value back to me.
No comments yet
This post has an unusually large number of code blocks without syntax highlighting since they're copy-pasted outputs from the debug tool which isn't in any formal syntax.
https://simonwillison.net/tags/anthropic/
https://simonwillison.net/tags/gemini/
https://simonwillison.net/tags/mistral/
https://simonwillison.net/tags/qwen/
This is just a new fad like agile, where the process and tool obsessed developers blog and preach endlessly without delivering any results.
For them "it works". The rest of us can either join the latest madness or avoid places that use LLMs, which of course includes all open source projects that do actual work that is not measured in GitHub kLOCs and marketing speak.
The hate: "LLM's can't do anything without hand holding"
I think both of these takes a disingenuous.
> not productive in the sense of solving real world computing problems.
Solving problems is a pretty vague term.
> The fact that you can hack things that do cool stuff
Lots of times this is how problems actually get solved. I would argue most of the time this is how problems get solved. More so if your working with other peoples software because your not reinventing CUPS, Dashboards, VPN's, Marketing Tools, Mail clients, Chat Clients and so on... I would argue that LOTS of good software is propped up directly and indirectly by this sort of hacking.