Xbow has really smart people working on it, so they're well-aware of the usual 30-second critiques that come up in this thread. For example, they take specific steps to eliminate false positives.
The #1 spot in the ranking is both more of a deal and less of a deal than it might appear. It's less of a deal in that HackerOne is an economic numbers game. There are countless programs you can sign up for, with varied difficulty levels and payouts. Most of them pay not a whole lot and don't attract top talent in the industry. Instead, they offer supplemental income to infosec-minded school-age kids in the developing world. So I wouldn't read this as "Xbow is the best bug hunter in the US". That's a bit of a marketing gimmick.
But this is also not a particularly meaningful objective. The problem is that there's a lot of low-hanging bugs that need squashing and it's hard to allocate sufficient resources to that. Top infosec talent doesn't want to do it (and there's not enough of it). Consulting companies can do it, but they inevitably end up stretching themselves too thin, so the coverage ends up being hit-and-miss. There's a huge market for tools that can find easy bugs cheaply and without too many false positives.
tecleandor · 2h ago
First:
> To bridge that gap, we started dogfooding XBOW in public and private bug bounty programs hosted on HackerOne. We treated it like any external researcher would: no shortcuts, no internal knowledge—just XBOW, running on its own.
Is it dogfooding if you're not doing it to yourself? I'd considerit dogfooding only if they were flooding themselves in AI generated bug reports, not to other people. They're not the ones reviewing them.
Also, honest question: what does "best" means here? The one that has sent the most reports?
jamessinghal · 2h ago
Their success rates on HackerOne seem widely varying.
22/24 (Valid / Closed) for Walt Disney
3/43 (Valid / Closed) for AT&T
pclmulqdq · 59m ago
Walt Disney doesn't pay bug bounties. AT&T's bounties go up to $5k, which is decent but still not much. It's possible that the market for bugs is efficient.
thaumasiotes · 2h ago
> Their success rate on HackerOne seems widely varying.
Some of that is likely down to company policies; Snapchat's policy, for example, is that nothing is ever marked invalid.
jamessinghal · 2h ago
Yes, I'm sure anyone with more HackerOne experience can give specifics on the companies' policies. For now, those are the most objective measures of quality we have on the reports.
moyix · 2h ago
This is discussed in the post – many came down to individual programs' policies e.g. not accepting the vulnerability if it was in a 3rd party product they used (but still hosted by them), duplicates (another researcher reported the same vuln at the same time; not really any way to avoid this), or not accepting some classes of vuln like cache poisoning.
mkagenius · 3h ago
> XBOW submitted nearly 1,060 vulnerabilities.
Yikes, explains why my manually submitted single vulnerability is taking weeks to triage.
tptacek · 2h ago
The XBOW people are not randos.
lcnPylGDnU4H9OF · 2h ago
That's not their point, I think. They're just saying that those nearly 1060 vulnerabilities are being processed so theirs is being ignored (hence "triage").
tptacek · 2h ago
If that's all they're saying then there isn't much to do with the sentiment; if you're legit-finding #1061 after legit-findings #1-#1060, that's just life in the NFL. I took instead the meaning that the findings ahead of them were less than legit.
lcnPylGDnU4H9OF · 2h ago
> there isn't much to do with the sentiment
I see what you're saying but I think a more charitable interpretation can be made. They may be amazed that so many bug reports are being generated by such a reputable group. Looking at your initial reply, perhaps a more constructive comment could be one that joins them in excitement (even if that assumption is erroneous) and expanding on why you think it is exciting (e.g. this group's reputation for quality).
stronglikedan · 1h ago
> I took instead the meaning that the findings ahead of them were less than legit.
I took instead the opposite - that they were no longer shocked that it was taking so long once they found out why, as they knew who they were and understood.
croes · 2h ago
Whether it is legit-finding is precisely what needs to be checked, but you’re at spot 1061.
>130 resolved
>303 were classified as Triaged
>33 reports marked as new
>125 remain pending
>208 were marked as duplicates
>209 as informative
>36 not applicable
20% bind a lot of resources if you have a high input on submissions and the numbers will rise
tptacek · 2h ago
I think some context I probably don't share with the rest of this thread is that the average quality of a Hacker One submission is incredibly low. Like however bad you think the median bounty submission is, it's worse; think "people threatening to take you to court for not paying them for their report that they can 'XSS' you with the Chrome developer console".
peanut-walrus · 36m ago
My favorite one I've seen is "open redirect when you change the domain name in the browser address bar". This was submitted twice several years apart by two different people.
croes · 1h ago
We‘ll get this low quality submissions with AI too.
The problem is that the people who know how to use AI properly will slower and more careful in their submissions.
Many others won’t, so we‘ll get lots of noise hiding the real issues. AI makes it easy to produce many bad results in short time.
tptacek · 1h ago
Everyone already agrees with that; the interesting argument here is that it also makes it easy to produce many good results in short time.
mellosouls · 2h ago
Have XBow provided a link to this claim, I could only find:
Which shows a different picture. This may not invalidate their claim (best US), but a screenshot can be a bit cherry-picked.
wslh · 4m ago
I am looking forward for the LLM ELI5 explanation. If I understand correctly XBOW is really moving the needle/state-of-the-art here.
chc4 · 46m ago
I'm generally pretty bearish on AI security research, and think most people don't know anything about what they're talking about, but XBOW is frankly one of the few legitimately interesting and competent companies in the space, and their writeups and reports have good and well thought out results. Congrats!
ryandrake · 3h ago
Receiving hundreds of AI generated bug reports would be so demoralizing and probably turn me off from maintaining an open source project forever. I think developers are going to eventually need tools to filter out slop. If you didn’t take the time to write it, why should I take the time to read it?
moyix · 2h ago
All of these reports came with executable proof of the vulnerabilities – otherwise, as you say, you get flooded with hallucinated junk like the poor curl dev. This is one of the things that makes offensive security an actually good use case for AI – exploits serve as hard evidence that the LLM can't fake.
tptacek · 2h ago
These aren't like Github Issues reports; they're bug bounty programs, specifically stood up to soak up incoming reports from anonymous strangers looking to make money on their submissions, with the premise being that enough of those reports will drive specific security goals (the scope of each program is, for smart vendors, tailored to engineering goals they have internally) to make it worthwhile.
ryandrake · 1h ago
Got it! The financial incentive will probably turn out to be a double edged sword. Maybe in the pre-AI age, it’s By Design to drive those goals, but I bet the ability to automate submissions will inevitably alter the rules of these programs.
I think within the next 5 years or so, we are going to see a societal pattern repeating: any program that rewards human ingenuity and input will become industrialized by AI to the point where it becomes a cottage industry of companies flooding every program with 99% AI submissions. What used to be lone wolves or small groups of humans working on bounties will become truckloads of AI generated “stuff” trying to maximize revenue.
dcminter · 10m ago
I'm wary of a lot of AI stuff, but here:
> What used to be lone wolves or small groups of humans working on bounties will become truckloads of AI generated “stuff” trying to maximize revenue.
You're objecting to the wrong thing. The purpose of a bug bounty programme is not to provide a cottage industry for security artisans - it's to flush out security vulnerabilities.
There are reasonable objections to AI automation in this space, but this is not one of them.
bawolff · 34m ago
If you think the AI slop is demoralizing, you should see the human submissions bug bounties get.
There is a reason companies like hackerone exist - its because dealing with the submissions is terrible.
triknomeister · 3h ago
Eventually projects who can afford the smugness are going to charge people to be able to talk to open source developers.
tough · 3h ago
isnt that called enterprise support / consulting
triknomeister · 1h ago
This is without the enterprise.
tough · 1h ago
gotchu, maybe i could see github donations enabling issue creation or wahtever in the future idk
but foss is foss, i guess source available doesnt mean we have to read your messages see sqlite (wont even take PR's lol)
Nicook · 2h ago
Open source maintainers have been complaining about this for a while. https://sethmlarson.dev/slop-security-reports. I'm assuming the proliferation of AI will have some significant changes on/already has had for open source projects.
teeray · 3h ago
You see, the dream is another AI that reads the report and writes the issue in the bug tracker. Then another AI implements the fix. A third AI then reviews the code and approves and merges it. All without human interaction! Once CI releases the fix, the first AI can then find the same vulnerability plus a few new and exciting ones.
dingnuts · 2h ago
This is completely absurd. If generating code is reliable, you can have one generator make the change, and then merge and release it with traditional software.
If it's not reliable, how can you rely on the written issue to be correct, or the review, and so how does that benefit you over just blindly merging whatever changes are created by the model?
tempodox · 2h ago
Making sense is not required as long as “AI” vendors sell subscriptions.
croes · 2h ago
That’s why parent wrote it’s a dream.
It’s not real.
But you can bet someone will sell that as the solution.
jgalt212 · 3h ago
One would think if AI can generate the slop it could also triage the slop.
err4nt · 2h ago
How does it know the difference?
scubbo · 2h ago
I'm still on the AI-skeptic side of the spectrum (though shifting more towards "it has some useful applications"), but, I think the easy answer is - if different models/prompts are used in generation than in quality-/correctness-checking.
andrewstuart · 3h ago
All the fun vanishes.
tptacek · 2h ago
Good. It was in the way.
kiitos · 2h ago
In the way of what?
tptacek · 2h ago
Getting more bugs fixed.
kiitos · 33m ago
> Getting more bugs fixed.
OK.. but "getting more bugs fixed" isn't any kind of objective success metric for, well, anything, right?
It's fine if you want to use it as a KPI for your specific thing! But it's not like it's some global KPI for everyone?
bgwalter · 2h ago
"XBOW is an enterprise solution. If your company would like a demo, email us at info@xbow.com."
Like any "AI" article, this is an ad.
If you are willing to tolerate a high false positive rate, you can as well use Rational Purify or various analyzers.
moyix · 2h ago
You should come to my upcoming BlackHat talk on how we did this while avoiding false positives :D
You should publish the paper quietly here (I'm a Black Hat reviewer, FWIW) so people can see where you're coming from.
I know you've been on HN for awhile, and that you're doing interesting stuff; HN just has a really intense immune system against vendor-y stuff.
moyix · 2h ago
Yeah, it's been very strange being on the other side of that after 10 years in academia! But it's totally reasonable for people to be skeptical when there's a bunch of money sloshing around.
I'll see if I can get time to do a paper to accompany the BH talk. And hopefully the agent traces of individual vulns will also help.
tptacek · 2h ago
J'accuse! You were required to do a paper for BH anyways! :)
moyix · 2h ago
Wait a sec, I thought they were optional?
> White Paper/Slide Deck/Supporting Materials (optional)
> • If you have a completed white paper or draft, slide deck, or other supporting materials, you can
optionally provide a link for review by the board.
> • Please note: Submission must be self-contained for evaluation, supporting materials are optional.
> • PDF or online viewable links are preferred, where no authentication/log-in is required.
I think you're fine, most people don't take the paper bit seriously. It's not due until the end of July regardless (you don't need a paper to submit for the CFP).
jekwoooooe · 3h ago
They should ban this or else they will get swallowed up and companies will stop working with them. The last thing I want is a bunch of llm slop sent to me faster than a human would
danmcs · 2h ago
HackerOne was already useless years before LLMs. Vulnerability scanning was already automated.
When we put our product on there, roughly 2019, the enterprising hackers ran their scanners, submitted everything they found as the highest possible severity to attempt to maximize their payout, and moved on. We wasted time triaging all the stuff they submitted that was nonsense, got nothing valuable out of the engagement, and dropped HackerOne at the end of the contract.
You'd be much better off contracting a competent engineering security firm to inspect your codebase and infrastructure.
tptacek · 2h ago
Moreover, I don't think XBOW is likely generating the kind of slop beg bounty people generate. There's some serious work behind this.
tecleandor · 2h ago
Still they're sending hundreds of reports that are being refused because they are not following the rules of the bounties. So they better work on that.
tptacek · 2h ago
If you thought human bounty program participants were generally following the rules, or that programs weren't swamped with slop already... at least these are actually pre-triaged vetted findings.
radialstub · 2h ago
Do you have sources for if we want to learn more?
moyix · 2h ago
We've got a bunch of agent traces on the front page of the web site right now. We also have done writeups on individual vulnerabilities found by the system, mostly in open source right now (we did some fun scans of OSS projects found on Docker Hub). We have a bunch more coming up about the vulns found in bug bounty targets. The latter are bottlenecked by getting approval from the companies affected, unfortunately.
Some of my favorites from what we've released so far:
- Exploitation of an n-day RCE in Jenkins, where the agent managed to figure out the challenge environment was broken and used the RCE exploit to debug the server environment and work around the problem to solve the challenge: https://xbow.com/#debugging--testing--and-refining-a-jenkins...
- Authentication bypass in Scoold that allowed reading the server config (including API keys) and arbitrary file read: https://xbow.com/blog/xbow-scoold-vuln/
As long as they maintain a history per account and discourage gaming with new accounts, I don't see why anyone would want slop that performed lower just because the slop was manual. (I just had someone tell me that they wished the nonsensical bounty submissions they triaged were at least being fixed up with gpt3.)
The main difference is that all of the vulnerabilities reported here are real, many quite critical (XXE, RCE, SQLi, etc.). To be fair there were definitely a lot of XSS, but the main reason for that is that it's a really common vulnerability.
nottorp · 3h ago
Oh, there are competitions for finding vulnerabilities in software?
That would explain why there's news every day that the world will end because someone discovered something that "could" be used if you already had local root...
Did that article presenting people trusting external input too much as json parser vulnerabilities make it to this competition?
The #1 spot in the ranking is both more of a deal and less of a deal than it might appear. It's less of a deal in that HackerOne is an economic numbers game. There are countless programs you can sign up for, with varied difficulty levels and payouts. Most of them pay not a whole lot and don't attract top talent in the industry. Instead, they offer supplemental income to infosec-minded school-age kids in the developing world. So I wouldn't read this as "Xbow is the best bug hunter in the US". That's a bit of a marketing gimmick.
But this is also not a particularly meaningful objective. The problem is that there's a lot of low-hanging bugs that need squashing and it's hard to allocate sufficient resources to that. Top infosec talent doesn't want to do it (and there's not enough of it). Consulting companies can do it, but they inevitably end up stretching themselves too thin, so the coverage ends up being hit-and-miss. There's a huge market for tools that can find easy bugs cheaply and without too many false positives.
> To bridge that gap, we started dogfooding XBOW in public and private bug bounty programs hosted on HackerOne. We treated it like any external researcher would: no shortcuts, no internal knowledge—just XBOW, running on its own.
Is it dogfooding if you're not doing it to yourself? I'd considerit dogfooding only if they were flooding themselves in AI generated bug reports, not to other people. They're not the ones reviewing them.
Also, honest question: what does "best" means here? The one that has sent the most reports?
Some of that is likely down to company policies; Snapchat's policy, for example, is that nothing is ever marked invalid.
Yikes, explains why my manually submitted single vulnerability is taking weeks to triage.
I see what you're saying but I think a more charitable interpretation can be made. They may be amazed that so many bug reports are being generated by such a reputable group. Looking at your initial reply, perhaps a more constructive comment could be one that joins them in excitement (even if that assumption is erroneous) and expanding on why you think it is exciting (e.g. this group's reputation for quality).
I took instead the opposite - that they were no longer shocked that it was taking so long once they found out why, as they knew who they were and understood.
>130 resolved
>303 were classified as Triaged
>33 reports marked as new
>125 remain pending
>208 were marked as duplicates
>209 as informative
>36 not applicable
20% bind a lot of resources if you have a high input on submissions and the numbers will rise
The problem is that the people who know how to use AI properly will slower and more careful in their submissions.
Many others won’t, so we‘ll get lots of noise hiding the real issues. AI makes it easy to produce many bad results in short time.
https://hackerone.com/xbow?type=user
Which shows a different picture. This may not invalidate their claim (best US), but a screenshot can be a bit cherry-picked.
I think within the next 5 years or so, we are going to see a societal pattern repeating: any program that rewards human ingenuity and input will become industrialized by AI to the point where it becomes a cottage industry of companies flooding every program with 99% AI submissions. What used to be lone wolves or small groups of humans working on bounties will become truckloads of AI generated “stuff” trying to maximize revenue.
> What used to be lone wolves or small groups of humans working on bounties will become truckloads of AI generated “stuff” trying to maximize revenue.
You're objecting to the wrong thing. The purpose of a bug bounty programme is not to provide a cottage industry for security artisans - it's to flush out security vulnerabilities.
There are reasonable objections to AI automation in this space, but this is not one of them.
There is a reason companies like hackerone exist - its because dealing with the submissions is terrible.
but foss is foss, i guess source available doesnt mean we have to read your messages see sqlite (wont even take PR's lol)
If it's not reliable, how can you rely on the written issue to be correct, or the review, and so how does that benefit you over just blindly merging whatever changes are created by the model?
It’s not real.
But you can bet someone will sell that as the solution.
OK.. but "getting more bugs fixed" isn't any kind of objective success metric for, well, anything, right?
It's fine if you want to use it as a KPI for your specific thing! But it's not like it's some global KPI for everyone?
Like any "AI" article, this is an ad.
If you are willing to tolerate a high false positive rate, you can as well use Rational Purify or various analyzers.
https://www.blackhat.com/us-25/briefings/schedule/#ai-agents...
I know you've been on HN for awhile, and that you're doing interesting stuff; HN just has a really intense immune system against vendor-y stuff.
I'll see if I can get time to do a paper to accompany the BH talk. And hopefully the agent traces of individual vulns will also help.
> White Paper/Slide Deck/Supporting Materials (optional)
> • If you have a completed white paper or draft, slide deck, or other supporting materials, you can optionally provide a link for review by the board.
> • Please note: Submission must be self-contained for evaluation, supporting materials are optional.
> • PDF or online viewable links are preferred, where no authentication/log-in is required.
(From the link on the BHUSA CFP page, which confusingly goes to the BH Asia doc: https://i.blackhat.com/Asia-25/BlackHat-Asia-2025-CFP-Prepar... )
When we put our product on there, roughly 2019, the enterprising hackers ran their scanners, submitted everything they found as the highest possible severity to attempt to maximize their payout, and moved on. We wasted time triaging all the stuff they submitted that was nonsense, got nothing valuable out of the engagement, and dropped HackerOne at the end of the contract.
You'd be much better off contracting a competent engineering security firm to inspect your codebase and infrastructure.
Some of my favorites from what we've released so far:
- Exploitation of an n-day RCE in Jenkins, where the agent managed to figure out the challenge environment was broken and used the RCE exploit to debug the server environment and work around the problem to solve the challenge: https://xbow.com/#debugging--testing--and-refining-a-jenkins...
- Authentication bypass in Scoold that allowed reading the server config (including API keys) and arbitrary file read: https://xbow.com/blog/xbow-scoold-vuln/
- The first post about our HackerOne findings, an XSS in Palo Alto Networks GlobalProtect VPN portal used by a bunch of companies: https://xbow.com/blog/xbow-globalprotect-xss/
That would explain why there's news every day that the world will end because someone discovered something that "could" be used if you already had local root...
Did that article presenting people trusting external input too much as json parser vulnerabilities make it to this competition?