I used o3 to find a remote zeroday in the Linux SMB implementation

84 zielmicha 27 5/24/2025, 2:25:45 PM sean.heelan.io ↗

Comments (27)

Retr0id · 1h ago
The article cites a signal to noise ratio of ~1:50. The author is clearly deeply familiar with this codebase and is thus well-positioned to triage the signal from the noise. Automating this part will be where the real wins are, so I'll be watching this closely.
manmal · 59s ago
If the LLM wrote a harness and proof of concept tests for its leads, then it might increase S/N dramatically. It’s just quite expensive to do all that right now.
ianbutler · 27m ago
We’ve been working on a system that increases signal to noise dramatically for finding bugs, we’ve at the same time been thoroughly benchmarking the entire popular software agents space for this

We’ve found a wide range of results and we have a conference talk coming up soon where we’ll be releasing everything publicly so stay tuned for that itll be pretty illuminating on the state of the space

Edit: confusing wording

sebmellen · 2m ago
Interesting. This is for Bismuth? I saw your pilot program link — what does that involve?
tough · 55m ago
I was thinking about this the other day, wouldn't it be feasible to make fine-tune or something like that into every git change, mailist, etc, the linux kernel has ever hard?

Wouldn't such an LLM be the closer -synth- version of a person who has worked on a codebase for years, learnt all its quirks etc.

There's so much you can fit on a high context, some codebases are already 200k Tokens just for the code as is, so idk

sodality2 · 48m ago
I'd be willing to bet the sum of all code submitted via patches, ideas discussed via lists, etc doesn't come close to the true amount of knowledge collected by the average kernel developer's tinkering, experimenting, etc that never leaves their computer. I also wonder if that would lead to overfitting: the same bugs being perpetuated because they were in the training data.
quentinp · 40m ago
Exactly. Many AI users can’t triage effectively, as a result open source projects get a lot of spam now: https://arstechnica.com/gadgets/2025/05/open-source-project-...
andix · 48m ago
1:50 is a great detection ratio for finding a needle in a haystack.
iandanforth · 46m ago
The most interesting and significant bit of this article for me was that the author ran this search for vulnerabilities 100 times for each of the models. That's significantly more computation than I've historically been willing to expend on most of the problems that I try with large language models, but maybe I should let the models go brrrrr!
roncesvalles · 16m ago
A lot of money is all you need~
jobswithgptcom · 35m ago
Wow, interesting. I been hacking a tool called https://diffwithgpt.com with a similar angle but indexing git changelogs with qwen to have it raise risks for backward compat issues, risks including security when upgrading k8s etc.
logifail · 51m ago
My understanding is that ksmbd is a kernel-space SMB server "developed as a lightweight, high-performance alternative" to the traditional (user-space) Samba server...

Q1: Who is using ksmbd in production?

Q2: Why?

donnachangstein · 17m ago
1. People that were using the in-kernel SMB server in Solaris or Windows.

2. Samba performance sucks (by comparison) which is why people still regularly deploy Windows for file sharing in 2025.

Anybody know if this supports native Windows-style ACLs for file permissions? That is the last remaining reason to still run Solaris but I think it relies on ZFS to do so.

Samba's reliance on Unix UID/GID and the syncing as part of its security model is still stuck in the 1970s unfortunately.

The caveat is the in-kernel SMB server has been the source of at least one holy-shit-this-is-bad zero-day remote root hole in Windows (not sure about Solaris) so there are tradeoffs.

pixl97 · 49m ago
I would assume for the reason of being lightweight and high performance?
foobar10000 · 45m ago
Smb over 25gbit networks - user space samba is much worse there.
Henchman21 · 40m ago
This is interesting to me! I regularly deploy 25G network connections, but I don’t think we’d run SMB over that. I am super curious the industry and use case if you’re willing to share!
hackernudes · 29m ago
"SMB Direct" is RDMA based and ksmbd supports it. Samba does not. Disclaimer: I have not used it but was looking it up just yesterday.
zielmicha · 4h ago
(To be clear, I'm not the author of the post, the title just starts with "How I")
akomtu · 12m ago
This made me think that the near future will be LLMs trained specifically on Linux or another large project. The source code is a small part of the dataset fed to LLMs. The more interesting is runtime data flow, similar to what we observe in a debugger. Looking at the codebase alone is like trying to understand a waterfall by looking at equations that describe the water flow.
empath75 · 12m ago
Given the value of finding zero days, pretty much every intelligence agency in the world is going to be pouring money into this if it can reliably find them with just a few hundred api calls. Especially if you can fine tune a model with lots of examples, which I don't think open ai, etc are going to do with any public api.
Hilift · 1h ago
Does the vulnerability exist in other implementations of SMB?
mezyt · 39m ago
Meanwhile, as a maintainer, I've been reviewing more than a dozen false positives slop CVEs in my library and not a single one found an actual issue. This article's is probably going to make my situation worse.
zison · 4h ago
Very interesting. Is the bug it found exploitable in practice? Could this have been found by syzkaller?
mdaniel · 3h ago
I case anyone else didn't recognize that word: https://github.com/google/syzkaller
mdaniel · 3h ago
Noteable:

> o3 finds the kerberos authentication vulnerability in 8 of the 100 runs

And I'd guess this only became a blog post because the author already knew about the vuln and was just curious to see if the intern could spot it too, given a curated subset of the codebase

moyix · 1h ago
He did do exactly what you say – except right after that, while reviewing the outputs, he found that it had also discovered a different 0day.
PunchyHamster · 32m ago
Now the question is whether spending same time to analyze that bit of code instead of throwing automated intern at it would be time spent better