Metal Barrels Dumped Off the Coast of LA Are Encircled by Mysterious White Halos (smithsonianmag.com)

I’m Henry, cofounder and CTO at Span (https://span.app/). Today we’re launching AI Code Detector, an AI code detection tool you can try in your browser.

The explosion of AI generated code has created some weird problems for engineering orgs. Tools like Cursor and Copilot are used by virtually every org on the planet – but each codegen tool has its own idiosyncratic way of reporting usage. Some don’t report usage at all.

Our view is that token spend will start competing with payroll spend as AI becomes more deeply ingrained in how we build software, so understanding how to drive proficiency, improve ROI, and allocate resources relating to AI tools will become at least as important as parallel processes on the talent side.

Getting true visibility into AI-generated code is incredibly difficult. And yet it’s the number one thing customers ask us for.

So we built a new approach from the ground up.

Our AI Code Detector is powered by span-detect-1, a state-of-the-art model trained on millions of AI- and human-written code samples. It detects AI-generated code with 95% accuracy, and ties it to specific lines shipped into production. Within the Span platform, it’ll give teams a clear view into AI’s real impact on velocity, quality, and ROI.

It does have some limitations. Most notably, it only works for TypeScript and Python code. We are adding support for more languages: Java, Ruby, and C# are next. Its accuracy is around 95% today, and we’re working on improving that, too.

If you’d like to take it for a spin, you can run a code snippet here (https://code-detector.ai/) and get results in about five seconds. We also have a more narrative-driven microsite (https://www.span.app/detector) that my marketing team says I have to share.

Would love your thoughts, both on the tool itself and your own experiences. I’ll be hanging out in the comments to answer questions, too.

Comments (15)

JohnFriel · 6m ago

This is interesting. Do you know what features the classifier is matching on? Like how much does stuff like whitespace matter here vs. deeper code structure? Put differently, if you were to parse the AI and non-AI code into AST and train a classifier based on that, would the results be the same?

henryl · 3m ago

Candidly, it's a bit of a black box still. We hope to do some ablation studies soon, but we tried to have a variety of formatting and commenting styles represented in both training and evaluation.

jensneuse · 5m ago

Could I use this to iterate over my AI generated code until it's not detectable anymore? So essentially the moment you publish this tool it stops working?

well_actulily · 1m ago

This is essentially the adversarial generator/discriminator set-up that GANs use.

henryl · 4m ago

I'm sure you can but there isn't really an adversarial motive for doing that, I would think :)

johnsillings · 5m ago

sharing the technical announcement here (more info on evaluations, comparison to other models, etc): https://www.span.app/introducing-span-detect-1

Ndotkess · 6m ago

What is your approach to measuring accuracy?

johnsillings · 4m ago

I'm sure Henry will chime in here, but there's some more info here in the technical announcement: https://www.span.app/introducing-span-detect-1

"span-detect-1 was evaluated by an independent team within Span. The team’s objective was to create an eval that’s free from training data contamination and reflecting realistic human and AI authored code patterns. The focus was on 3 sources: real world human, AI code authored by Devin crawled from public GitHub repositories, and AI samples that we synthesized for “brownfield” edits by leading LLMs. In the end, evaluation was performed with ~45K balanced datasets for TypeScript and Python each, and an 11K sample set for TSX."

henryl · 5m ago

More details about how we eval'ed here:

https://www.span.app/introducing-span-detect-1

samfriedman · 12m ago

Accuracy is a useless statistic: give us precision and recall.

henryl · 5m ago

Recall 91.5, F1 93.3

bigyabai · 22m ago

I can detect AI-generated code with 100% accuracy, provided you give me an unlimited budget for false positives. It's a bit of a useless metric.

henryl · 17m ago

I'd argue that knowing AI generated code shipped into production is the first step to understanding the impact of AI coding assistants on velocity and quality. When paired with additional context, it can help leaders understand how to improve proficiency around these tools.

jfarina · 9m ago

That's not relevant to the comment you replied to.

henryl · 2m ago

Ah - I misread:

Recall 91.5, F1 93.3

Another lawsuit blames an AI company of complicity in a teenager's suicide (engadget.com)

Tyler Robinson charged with aggravated murder, DNA found on gun trigger (apnews.com)

Meta RayBan AR Glasses Shows Lumus Waveguide Structures in Leaked Video (kguttag.com)

Metal Barrels Dumped Off the Coast of LA Are Encircled by Mysterious White Halos (smithsonianmag.com)

Show HN: SubKeep – Nested Labels for Google Keep (Chrome Extension) (chromewebstore.google.com)

Details Emerge on U.S.-China TikTok Deal (wsj.com)

'The bombing has been insane': Palestinians flee Israeli assault on Gaza City (bbc.com)

Reverse engineering a flight spotting app (blog.jonlu.ca)

Cereal Box Records Sound Horrible. They Still Look Incredible (nytimes.com)

Taste (moderndescartes.com)

Fetch streams are great, but not for measuring upload/download progress (jakearchibald.com)

Alex Karp Insists Palantir Doesn't Spy on Americans. Here's What He's Not Saying (theintercept.com)

The Open Source Initiative's director departs: What it means for open AI (zdnet.com)

National park to remove photo of enslaved man's scars (washingtonpost.com)

Workday signs agreement to aquire Sana Labs (newsroom.workday.com)

How to make the Framework Desktop run even quieter (noctua.at)

AIDir – The first AI directory turns 3 today (aidir.wiki)

Millions turn to AI chatbots for spiritual guidance and confession (arstechnica.com)

Ask HN: Claude file creation/edit feature leading to worse coding performance?

Building a better LLM augmented Search Engine (Blog) (saipraneeth.in)

Transparency done right: Buttondown's OSS stack and donations (buttondown.com)

Fastest Shoelace Knot (fieggen.com)

FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics (arxiv.org)

A new report finds China's space program will soon equal that of the US (arstechnica.com)

Pica Numbers (home.octetfont.com)

Live Updates: Shai-Hulud, the Most Dangerous NPM Breach in History (koi.security)

Ask HN: Startup with Small Children

Mozilla's Lifeline Is Safe After Judge's Google Antitrust Ruling (news.itsfoss.com)

Japan to subsidize undersea cable vessels over serious national security concern (tomshardware.com)

Show HN: Quizquestions.org – A free library for quiz questions (quizquestions.org)

Google unveils master plan for letting AI shop on your behalf (theregister.com)

Adios Chicos, 25 Years of KDE (jriddell.org)

Show HN: AI Code Detector – detect AI-generated code with 95% accuracy (code-detector.ai)

Is ADHD just a problem with your work environment? (medium.com)

80k-year-old arrowheads in Uzbekistan may be the oldest (archaeologymag.com)

Making atoms self-magnify reveals their quantum wave functions (newscientist.com)

Reducing Cold Start Latency for LLM Inference with NVIDIA Run:AI Model Streamer (developer.nvidia.com)

Procedural clue mini game (logic) (cluesbysam.com)

SwiftCloudDrive: An easy to use Swift wrapper around iCloud Drive (github.com)

TTY, Linux Framebuffer, Eglfs and SDL Without Xorg or Wayland (imlauer.github.io)

VPN Market Is Booming Because We're Working Remotely and Worried About Privacy (cnet.com)

Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes (github.com)

Show the Physics (interactivetextbooks.tudelft.nl)

Top% of Earners Drive a Growing Share of US Consumer Spending (bloomberg.com)

Denmark close to wiping out cancer-causing HPV strains after vaccine roll-out (gavi.org)

Signed URLs for Embedded Devices (blog.golioth.io)

The Responsibility of Intellectuals in the Age of Fascism and Genocide (bostonreview.net)

Show HN: Gingee – A GenAI Authored JavaScript App Server (github.com)

Scaling your models to Zero with Fly.io (xeiaso.net)

ReactOS (en.wikipedia.org)

Show HN: AI Code Detector – detect AI-generated code with 95% accuracy

Comments (15)