Show HN: AI Code Detector – detect AI-generated code with 95% accuracy

19 henryl 15 9/16/2025, 6:18:56 PM code-detector.ai ↗
Hey HN,

I’m Henry, cofounder and CTO at Span (https://span.app/). Today we’re launching AI Code Detector, an AI code detection tool you can try in your browser.

The explosion of AI generated code has created some weird problems for engineering orgs. Tools like Cursor and Copilot are used by virtually every org on the planet – but each codegen tool has its own idiosyncratic way of reporting usage. Some don’t report usage at all.

Our view is that token spend will start competing with payroll spend as AI becomes more deeply ingrained in how we build software, so understanding how to drive proficiency, improve ROI, and allocate resources relating to AI tools will become at least as important as parallel processes on the talent side.

Getting true visibility into AI-generated code is incredibly difficult. And yet it’s the number one thing customers ask us for.

So we built a new approach from the ground up.

Our AI Code Detector is powered by span-detect-1, a state-of-the-art model trained on millions of AI- and human-written code samples. It detects AI-generated code with 95% accuracy, and ties it to specific lines shipped into production. Within the Span platform, it’ll give teams a clear view into AI’s real impact on velocity, quality, and ROI.

It does have some limitations. Most notably, it only works for TypeScript and Python code. We are adding support for more languages: Java, Ruby, and C# are next. Its accuracy is around 95% today, and we’re working on improving that, too.

If you’d like to take it for a spin, you can run a code snippet here (https://code-detector.ai/) and get results in about five seconds. We also have a more narrative-driven microsite (https://www.span.app/detector) that my marketing team says I have to share.

Would love your thoughts, both on the tool itself and your own experiences. I’ll be hanging out in the comments to answer questions, too.

Comments (15)

JohnFriel · 6m ago
This is interesting. Do you know what features the classifier is matching on? Like how much does stuff like whitespace matter here vs. deeper code structure? Put differently, if you were to parse the AI and non-AI code into AST and train a classifier based on that, would the results be the same?
henryl · 3m ago
Candidly, it's a bit of a black box still. We hope to do some ablation studies soon, but we tried to have a variety of formatting and commenting styles represented in both training and evaluation.
jensneuse · 5m ago
Could I use this to iterate over my AI generated code until it's not detectable anymore? So essentially the moment you publish this tool it stops working?
well_actulily · 1m ago
This is essentially the adversarial generator/discriminator set-up that GANs use.
henryl · 4m ago
I'm sure you can but there isn't really an adversarial motive for doing that, I would think :)
johnsillings · 5m ago
sharing the technical announcement here (more info on evaluations, comparison to other models, etc): https://www.span.app/introducing-span-detect-1
Ndotkess · 6m ago
What is your approach to measuring accuracy?
johnsillings · 4m ago
I'm sure Henry will chime in here, but there's some more info here in the technical announcement: https://www.span.app/introducing-span-detect-1

"span-detect-1 was evaluated by an independent team within Span. The team’s objective was to create an eval that’s free from training data contamination and reflecting realistic human and AI authored code patterns. The focus was on 3 sources: real world human, AI code authored by Devin crawled from public GitHub repositories, and AI samples that we synthesized for “brownfield” edits by leading LLMs. In the end, evaluation was performed with ~45K balanced datasets for TypeScript and Python each, and an 11K sample set for TSX."

henryl · 5m ago
More details about how we eval'ed here:

https://www.span.app/introducing-span-detect-1

samfriedman · 12m ago
Accuracy is a useless statistic: give us precision and recall.
henryl · 5m ago
Recall 91.5, F1 93.3
bigyabai · 22m ago
I can detect AI-generated code with 100% accuracy, provided you give me an unlimited budget for false positives. It's a bit of a useless metric.
henryl · 17m ago
I'd argue that knowing AI generated code shipped into production is the first step to understanding the impact of AI coding assistants on velocity and quality. When paired with additional context, it can help leaders understand how to improve proficiency around these tools.
jfarina · 9m ago
That's not relevant to the comment you replied to.
henryl · 2m ago
Ah - I misread:

Recall 91.5, F1 93.3