Launch HN: MindFort (YC X25) – AI agents for continuous pentesting
Here's a demo: https://www.loom.com/share/e56faa07d90b417db09bb4454dce8d5a
Security testing today is increasingly challenging. Traditional scanners generate 30-50% false positives, drowning engineering teams in noise. Manual penetration testing happens quarterly at best, costs tens of thousands per assessment, and takes weeks to complete. Meanwhile, teams are shipping code faster than ever with AI assistance, but security reviews have become an even bigger bottleneck.
All three of us encountered this problem from different angles. Brandon worked at ProjectDiscovery building the Nuclei scanner, then at NetSPI (one of the largest pen testing firms) building AI tools for testers. Sam was a senior engineer at Salesforce leading security for Tableau. He dealt firsthand with juggling security findings and managing remediations. Akul did his master's on AI and security, co-authored papers on using LLMs for ecurity attacks, and participated in red-teams at OpenAI and Anthropic.
We all realized that AI agents were going to fundamentally change security testing, and that the wave of AI-generated code would need an equally powerful solution to keep it secure.
We've built AI agents that perform reconnaissance, exploit vulnerabilities, and suggest patches—similar to how a human penetration tester works. The key difference from traditional scanners is that our agents validate exploits in runtime environments before reporting them, reducing false positives.
We use multiple foundational models orchestrated together. The agents perform recon to understand the attack surface, then use that context to inform testing strategies. When they find potential vulnerabilities, they spin up isolated environments to validate exploitation. If successful, they analyze the codebase to generate contextual patches.
What makes this different from existing tools? Validation through exploitation: We don't just pattern-match—we exploit vulnerabilities to prove they're real; - Codebase integration: The agents understand your code structure to find complex logic bugs and suggest appropriate fixes; - Continuous operation: Instead of point-in-time assessments, we're constantly testing as your code evolves; - Attack chain discovery: The agents can find multi-step vulnerabilities that require chaining different issues together.
We're currently in early access, working with initial partners to refine the platform. Our agents are already finding vulnerabilities that other tools miss and scoring well on penetration testing benchmarks.
Looking forward to your thoughts and comments!
A few questions:
On your site it says, "MindFort can asses 1 or 100,000 page web apps seamlessly. It can also scale dynamically as your applications grow."
Can you provide more color as to what that really means? If I were actually to ask you to asses 100,000 pages what would actually happen? Is it possible for my usage to block/brown-out another customer's usage?
I'm also curious what happens if the system does detect a vulnerability. Is there any chance the bot does something dangerous with e.g. it's newly discovered escalated privileges?
Thanks and good luck!
In regards to the scale, we absolutely can assess at that scale, but it would require quite a large enterprise contract upfront, as we would need to get the required capacity from our providers.
The system is designed to safely test exploitation, and not perform destructive testing. It will traverse as far as it can, but it won't break anything along the way.
Point it at a publicly available webapp? Run it locally against dev? Do I self-host it and continually run against staging as it's updated?
How do your agents decide a suspected issue is a validated vulnerability, and what measured false-positive/false-negative rates can you share?
How is customer code and data isolated and encrypted throughout reconnaissance, exploitation, and patch generation (e.g., single-tenant VPC, data-retention policy)?
Do the agents ever apply patches automatically, or is human review required—and how does the workflow integrate with CI/CD to prevent regressions?
Ty!
The agents will hone in on a potential vulnerability by looking at different signals during its testing, and then build a POC to validate it based on the context. We don't have any data to share publicly yet but we are working on releasing benchmarks soon.
Everything runs in a private VPC and data is encrypted in transit and at rest. We have zero data retention agreements with our vendors, and we do offer single tenant and private cloud deployments for customers. We don't retain any customer code once we finish processing it, only the vulnerability data. We are also in process of receiving our SOC 2.
Patches are not auto applied. We can either open up a PR for human review or can add the necessary changes to a Linear/Jira ticket. We have the ability schedule assessments in our platform, and are working on a way to integrate more deeply with CI/CD.
even the booters market themselves as as "legitimate stress testing tools for enterprise"
We want to solve the entire vulnerability lifecycle problem not just finding zero days. MindFort works from detection, validation, triage/scoring, all the way to patching the vulnerability. While we are starting with web app, we plan to expand to the rest of the attack surface soon.
How about pen-testing a black box?
Does the potential vulnerabilities list is generated by matching list of vulnerabilities that are publicly disclosed for the framework version of target software stack constituents?
I am new to LLMs or any ML for that matter. Congrats on your launch.
Great question, it is not required but we recommend it. If you don't include the source code, it would be black box. The agents won't know what the app looks like from the other side.
The agents identify vulns using known attack patterns, novel techniques, and threat intelligence.