Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code

38 jjjutla 22 7/31/2025, 4:23:09 PM

Hey HN, I'm JJ, Co-Founder of Gecko Security (https://www.gecko.security). We're building a new kind of static analysis tool that uses LLMs to find complex business logic and multi-step vulnerabilities that current scanners miss. We’ve used it to find 30+ CVEs in projects like Ollama, Gradio, and Ragflow (https://www.gecko.security/research). You can try it yourself on any OSS repo at (https://app.gecko.security).

Anyone who’s used SAST (Static Application Security Testing) tools knows the issues of high false positives while missing entire classes of vulnerabilities like AuthN/Z bypasses or privilege escalations. This limitation is a result of their core architecture. By design, SAST tools parse code into a simplistic model like an AST or call graph, which quickly loses context in dynamically typed languages or across microservice boundaries, and limits coverage to only resolving basic call chains. When detecting vulnerabilities they rely on pattern matching with Regex or YAML rules, which can be effective for basic technical classes like (XSS, SQLi) but inadequate for logic flaws that don’t conform to well-known shapes and need long sequences of dependent operations to reach an exploitable state.

My co-founder and I saw these limitations throughout our careers in national intelligence and military cyber forces, where we built automated tooling to defend critical infrastructure. We realised that LLMs, with the right architecture, could finally solve them.

Vulnerabilities are contextual. What's exploitable depends entirely on each application's security model. We realized accurate detection requires understanding what's supposed to be protected and why breaking it matters. This meant embedding threat modeling directly into our analysis, not treating it as an afterthought.

To achieve this, we first had to solve the code parsing problem. Our solution was to build a custom, compiler-accurate indexer inspired by GitHub's stack graphs approach to precisely navigate code, like an IDE. We build on the LSIF approach (https://lsif.dev/) but replace the verbose JSON with a compact protobuf schema to serialise symbol definitions and references in a binary format. We use language‑specific tools to parse and type‑check code, emitting a sequence of Protobuf messages that record a symbol’s position, definition, and reference information. By using Protobuf’s efficiency and strong typing, we can produce smaller indexes, but also preserve the compiler‑accurate semantic information required for detecting complex call chains.

This is why most "SAST + LLM" tools that use AST parsing fail - they feed LLMs incomplete or incorrect code information from traditional parsers, making it difficult to accurately reason about security issues with missing context.

With our indexer providing accurate code structure, we use an LLM to perform threat modeling by analyzing developer intent, data and trust boundaries, and exposed endpoints to generate potential attack scenarios. This is where LLMs' tendency to hallucinate becomes a breakthrough feature.

For each potential attack path generated, we perform a systematic search, querying the indexer to gather all necessary context and reconstruct the full call chain from source to sink. To validate the vulnerability we use a Monte Carlo Tree Self-refine (MCTSr) algorithm and a 'win function' to determine the likelihood that a hypothesized attack could work. Once a finding is above a set practicality threshold it is confirmed as a true positive.

Using this approach, we discovered vulnerabilities like CVE-2025-51479 in ONYX (an OSS enterprise search platform) where Curators could modify any group instead of just their assigned ones. The user-group API had a user parameter that should check permissions but never used it. Gecko inferred developers intended to restrict Curator access because both the UI and similar API functions properly validated this permission. This established "curators have limited scope" as a security invariant that this specific API violated. Traditional SAST can't detect this. Any rule to flag unused user parameters would drown you in false positives since many functions legitimately keep unused parameters. And more importantly, detecting this requires knowing which functions handle authorization, understanding ONYX's Curator permission model, and recognizing the validation pattern across multiple files - contextual reasoning that SAST simply cannot do.

We have several enterprise customers using Gecko because it solves problems they couldn't address with traditional SAST tools. They're seeing 50% fewer false positives on the same codebases and finding vulnerabilities that previously only showed up in manual pentests.

Digging into false positives, no static analysis tool will ever achieve perfect accuracy, AI or otherwise. We reduce them at two key points. First, our indexer eliminates any programmatic parsing errors that create incorrect call chains that traditional AST tools are susceptible to. Second, we avoid unwanted LLM hallucinations and reasoning errors by asking specific, contextual questions rather than open-ended ones. The LLM knows which security invariants need to hold and can make deterministic assessments based on the context. When we do flag something, manual review is quick because we provide complete source-to-sink dataflow analysis with proof-of-concept code and output findings based on confidence scores.

We’d love to get any feedback from the community, ideas for future direction, or experiences in this space. I’ll be in the comments to respond!

Comments (22)

rixed · 2h ago

Coincidentally, the IA tool of Semgrep just signalled me a real although very minor issue on some C project a couple of days ago. So I tried gecko on the same repository to see if it could detect anything else, but no. So I removed the fix from the github repo to see if gecko would also complain about the issue, but I believe I hit a bug in the UI: I deleted the previous project and created a new one, using the same github URL of course, and although gecko said that it started the scan, the list of scans stayed disapointingly empty.

rixed · 2h ago

> to see if it could detect anything else, but no

Might be related to the fact that gecko does not support C apparently? At least that's the impression I got from hovering the mouse cursor on the minuscule list of pictos below "Supported Languages". Not supporting C and C++ in a tool looking for security issues is a bit of a bummer, no?

jjjutla · 2h ago

We’ve limited the free tier to one scan per user, so deleting a scan and starting a new one won’t work because of that restriction.

And yes, we don’t support C or C++ yet. Our focus is on detecting business logic vulnerabilities (auth bypasses, privilege escalations, IDORs) that traditional SAST tools often miss. The types of exploitable security issues typically found in C/C++ (mainly memory corruption type issues) are better found through fuzzing and dynamic testing rather than static analysis.

skanga · 5h ago

It's hard to evaluate such a tool. I scanned my OSS MCP server for databases at https://github.com/skanga/dbchat and it found 0 vulnerabilities. Now I'm wondering if my code is perfect :-) or the tool has issues!

pns1 · 48m ago

Very interesting and cool project.

Creating an accurate call graph is difficult, especially for dynamic languages such as JavaScript or TypeScript. The academia has spent decades of effort on this. I am wondering why your custom parser could do this much better. And, I am interested in how to store dynamic typing information into Protobuf's strong typing system.

Due to the limited context window, it is definitely unaffordable to provide the entire application's source code to the model. I am wondering what kind of "context" information is generally helpful for bug detection, like the call chain?

dnsbty · 5h ago

This is one area I expect LLMs to really shine. I've tried a few static analysis tools for security, but it feels like the cookie cutter checks aren't that effective for catching anything but the most basic vulnerabilities. Having context on the actual purpose of the code seems like a great way to provide better scans without needing to a researcher for a deeper pentest.

I just started a scan on an open source project I was looking at, but I would love to see you add Elixir to the list of supported languages so that I can use this for my team's codebase!

wglb · 5h ago

Static analysis tools were the bane of my existence being security guy at a software provider. A customer insisted on running a popular one on our 20 million line code base. Two of us spent two weeks clearing false positives. Absolutely nothing was left.

jjjutla · 5h ago

We've had a few request for Elixir and it's definitely something we will work on.

eranation · 5h ago

Congrats on the launch. How do you differentiate yourself from Corgea.com? Or general purpose AI code review solutions such as Cursor BugBot / GitHub Copilot Code Reviews / CodeRabbit?

jjjutla · 4h ago

Thank you. SAST tools built on AST or call graph parsing will struggle to detect code logic vulnerabilities because their models are too simplistic. They lose the language-specific semantics in dynamically typed languages where objects change at runtime, or in microservices where calls span multiple services. So they are limited to simple pattern-based detections and miss vulnerabilities that depend on long cross-file call chains and reflected function calls. These are the types of paths that auth bypasses and privilege escalations occur in.

AI code review tools aren’t designed for security analysis at all. They work using vector search or RAG to find relevant files, which is imprecise for retrieving these code paths in high token density projects. So any reasoning the LLM does is built on incomplete or incorrect context.

Our indexer uses LSIF for compiler-accurate symbol resolution so we can reconstruct full call chains, spanning files, modules, and services, with the same accuracy as an IDE. This code reasoning, tied with the LLM's threat modelling and analysis, allows for higher fidelity outputs.

Retr0id · 6h ago

I wanted to check it out but the oauth flow is asking for permission to write my github email address and profile settings. Is this a bug? If not, what are these permissions needed for?

It also asks for permission to "act on my behalf" which I can understand would be necessary for agent-y stuff but it's not something I'm willing to hand over for a mere vuln scan.

bagels · 5h ago

It says "Profile (write) Manage a user's profile settings.", not write email address. The "Act on your behalf" permission is even worse. I agree with you that it should only be asking for read permissions on anything for this purpose.

Retr0id · 5h ago

It was changed

jjjutla · 5h ago

This is a bug, the email-address permissions have been descoped to read-only. Profile settings are either read/write or none, hence the former. If you're concerned about privacy, sign up using email/password.

rixed · 2h ago

I was similarly put off but eventually figured out that you can merely create a normal email based login and point the tool to a publicly hosted git repository, which is nice.

Zopieux · 2h ago

https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-s... comes to mind.

I feel for the poor engineers who will have to triage thousands of false positives because $boss was pitched this tool (or one of the competitors) as the true™ solution to all their security problems.

bearsyankees · 5h ago

Super cool! Just tried it out and it is giving me 100% confidence for two vulnerabilities (one 9.4, one 6.5) that aren't real -- how is that confidence calculated?

jjjutla · 5h ago

The confidence score is calculated by two factors: whether the function call chain represents a valid code path (programmatic correctness) and how well it aligns with the defined threat model for what it thinks is a security vulnerability. False positives usually occur from incorrect assumptions about context, for example, flagging endpoints as missing authentication when such behaviour is actually intended.

Was this an incorrect code path or an incorrect understanding of a security issue?

This is why we focus heavily on threat modelling and defining the security and business invariants that must hold. From a code level, the only context we can infer is through developer intent and data flow analysis.

Something we are working on is custom rules and allowing a user to add context when starting a scan to improve alignment and reduces false positives.

bearsyankees · 5h ago

The security issue and POCs provided were not real like they said there was a vuln but I double checked it and it was not an exploitable vuln

dd_xplore · 5h ago

It reminds of AI bug reports in ffmpeg(was it ffmpeg?)

ciaranmca · 5h ago

Was it not curl https://arstechnica.com/gadgets/2025/05/open-source-project-...

jjjutla · 5h ago

For all the vulns Gecko found they were manually validated by humans and have a CVE assigned by a CNA. The issue that curl had was because it was a paid bug bounty program they had an influx of AI slop reports that looked like real issues but weren't exploitable.

Show HN: Play NYT Connections in the Terminal (github.com)

We Replaced ETL with MCP (rashidazarang.com)

Show HN: Demitter – Distributed Node.js Event Emitter (Pub/Sub) (github.com)

Photographs of Auto Polo (ca. 1912) (publicdomainreview.org)

Ry OS (ryo.lu)

How much would it cost to make an iPhone in America? (medium.com)

World Athletics introduces SRY gene test for athletes in the female category (worldathletics.org)

People stuck using ancient Windows computers (bbc.com)

Show HN: If Pocket and Spotify Wrapped had a baby (apps.apple.com)

Data Science Weekly – Issue 610 (datascienceweekly.substack.com)

Chaos after radioactive wasp nest is discovered atformer US nuclear bomb factory (dailymail.co.uk)

Syntax highlighting of Python code in Python (fatih-erikli-potato.github.io)

Suddenly, Trait-Based Embryo Selection (astralcodexten.com)

Hawaiian petroglyphs reemerge on Oahu's shores after years of being hidden (archaeologymag.com)

Medium (schappi.com)

Let the Kaleidoscope Turn (shabie.github.io)

The Colorado River is officially contaminated with invasive zebra mussels (azdailysun.com)

Age Verification Doesn't Need to Be a Privacy Footgun (soatok.blog)

Public ChatGPT queries are getting indexed by Google and other search engines (techcrunch.com)

Potential Future Restrictions on the Self-Hosted Version of Plane (2023) (github.com)

Proton Authenticator – secure 2FA, your way (proton.me)

You might not need tmux (bower.sh)

Testing the full AI editorial system on No Cap Blog

Low-dose metformin requires brain Rap1 for its antidiabetic action (science.org)

Battlefield 6 map editor is *checks notes* Godot (bsky.app)

Before You Trust Any AI-Written Code (gizmodo.com)

Giant stick insect species discovered in rainforests of Far North Queensland (abc.net.au)

SEC debuts 'Project Crypto' to bring U.S. financial markets 'on chain' (cnbc.com)

China Reveals Blueprint for 10GW Microwave Weapon Using Superradiance Tech (thedefensenews.com)

Bitter fight over 2020 Microsoft quantum paper both resolved and unresolved (theregister.com)

10Ms of free batch inference tokens (modular.com)

Amazon's cloud business records 18% growth in second quarter (cnbc.com)

Reinstating memories' temporal context causes Sisyphus-like memory rejuvenation (pnas.org)

Making the switch easier: our new Pinboard API bridge (linktaco.com)

Show HN: Factitio.us – Generate authoritative links to put arguments to rest (factitio.us)

Literacy lag: We start reading too late (theintrinsicperspective.com)

Show HN: I built a voice-first tool for Twitter, pivoted to cursor for X (x11.social)

Can Airships Outperform Satellites in Internet Service? (spectrum.ieee.org)

Starspace: Learning embeddings for classification, retrieval and ranking (ai.meta.com)

US lightning flash was longest on record at 515 miles, scientists say (theguardian.com)

Everything Is Cringe (jamie.sh)

"No Tax on Tips" Is an Industry Plant (newyorker.com)

Directory of Open Access Journals (doaj.org)

Show HN: Calystone – privacy-first note- & task-mnger that works with your files (github.com)

Benchmarking MicroPython (blog.miguelgrinberg.com)

TypeScript 5.9 RC – TypeScript (devblogs.microsoft.com)

Energy in Britain has gotten scarce–and thus expensive (bsky.app)

Flexflex: Typeface responds to spatial requirements rather than imposing them (github.com)

The Case for and Against Palo Alto Networks Acquiring CyberArk (strategyofsecurity.com)

Webflow CEO Post Mortem (webflow.com)

Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code

Comments (22)

Battlefield 6 map editor is checks notes Godot (bsky.app)