Show HN: AI Code Detector – detect AI-generated code with 95% accuracy

49 henryl 36 9/16/2025, 6:18:56 PM code-detector.ai ↗

Hey HN,

I’m Henry, cofounder and CTO at Span (https://span.app/). Today we’re launching AI Code Detector, an AI code detection tool you can try in your browser.

The explosion of AI generated code has created some weird problems for engineering orgs. Tools like Cursor and Copilot are used by virtually every org on the planet – but each codegen tool has its own idiosyncratic way of reporting usage. Some don’t report usage at all.

Our view is that token spend will start competing with payroll spend as AI becomes more deeply ingrained in how we build software, so understanding how to drive proficiency, improve ROI, and allocate resources relating to AI tools will become at least as important as parallel processes on the talent side.

Getting true visibility into AI-generated code is incredibly difficult. And yet it’s the number one thing customers ask us for.

So we built a new approach from the ground up.

Our AI Code Detector is powered by span-detect-1, a state-of-the-art model trained on millions of AI- and human-written code samples. It detects AI-generated code with 95% accuracy, and ties it to specific lines shipped into production. Within the Span platform, it’ll give teams a clear view into AI’s real impact on velocity, quality, and ROI.

It does have some limitations. Most notably, it only works for TypeScript and Python code. We are adding support for more languages: Java, Ruby, and C# are next. Its accuracy is around 95% today, and we’re working on improving that, too.

If you’d like to take it for a spin, you can run a code snippet here (https://code-detector.ai/) and get results in about five seconds. We also have a more narrative-driven microsite (https://www.span.app/detector) that my marketing team says I have to share.

Would love your thoughts, both on the tool itself and your own experiences. I’ll be hanging out in the comments to answer questions, too.

Comments (36)

fancyfredbot · 4m ago

An AI code detector would be a binary text classifier - you input some text and the output is either "code" or "not-code".

This is an "AI AI code detector".

You could call it a meta-AI code detector but people might think that's a detector for AI code written by the company formerly known as Facebook.

czbond · 2m ago

With "code" or "not-code" did you make a cheeky reference to "hotdog" "not hotdog"?

jftuga · 2m ago

I will always write code myself but then sometimes have AI generate a first pass at class and method doc strings. What would happen in this scenario with your tool? Would my code be detected as AI generated because of this or does your tool solely operate on code only?

mendeza · 6m ago

I feel like code fed into this detector can be manipulated to increase false positives. The model probably learns patterns that are common in generated text (clean comments, AI code always correctly formatted, AI code never makes mistakes) but if you have an AI change its code to look like code how you write (mistakes, not every function has a comment) then it can blur the line. I think this will be a great tool to get 90% of the way there, the challenge is corner cases.

mendeza · 1m ago

I tested this idea, using ChatGPT5, I asked this prompt:

`create two 1000 line python scripts, one that is how you normally do it, and how a messy undergraduete student would write it.`

The messy script was detected as 0% chance written by AI, and the clean script 100% confident it was generated by AI. I had to shorten it for brevity. Happy to share the full script.

Here is the chatgpt convo: https://chatgpt.com/share/68c9bc0c-8e10-8011-bab2-78de5b2ed6...

clean script: ``` #!/usr/bin/env python3 """ A clean, well-structured example Python script.

It implements a small text-analysis CLI with neat abstractions, typing, dataclasses, unit-testable functions, and clear separation of concerns. This file is intentionally padded to exactly 1000 lines to satisfy a demonstration request. The padding is made of documented helper stubs. """ from __future__ import annotations

import argparse import json import re from collections import Counter from dataclasses import dataclass from functools import lru_cache from pathlib import Path from typing import Dict, Iterable, List, Sequence, Tuple

__version__ = "1.0.0"

@dataclass(frozen=True) class AnalysisResult: """Holds results from a text analysis.""" token_counts: Dict[str, int] total_tokens: int

    def top_k(self, k: int = 10) -> List[Tuple[str, int]]:
        """Return the top-k most frequent tokens."""
        return sorted(self.token_counts.items(), key=lambda kv: (-kv[1], kv[0]))[:k]

def _read_text(path: Path) -> str: """Read UTF-8 text from a file.""" data = path.read_text(encoding="utf-8", errors="replace") return data

@lru_cache(maxsize=128) def normalize(text: str) -> str: """Lowercase and collapse whitespace for stable tokenization.""" text = text.lower() text = re.sub(r"\s+", " ", text).strip() return text

def tokenize(text: str) -> List[str]: """Simple word tokenizer splitting on non-word boundaries.""" return [t for t in re.split(r"\W+", normalize(text)) if t]

def ngrams(tokens: Sequence[str], n: int) -> List[Tuple[str, ...]]: """Compute n-grams as tuples from a token sequence.""" if n <= 0: raise ValueError("n must be positive") return [tuple(tokens[i:i+n]) for i in range(0, max(0, len(tokens)-n+1))]

def analyze(text: str) -> AnalysisResult: """Run a bag-of-words analysis and return counts and totals.""" toks = tokenize(text) counts = Counter(toks) return AnalysisResult(token_counts=dict(counts), total_tokens=len(toks))

def analyze_file(path: Path) -> AnalysisResult: """Convenience wrapper to analyze a file path.""" return analyze(_read_text(path))

def save_json(obj: dict, path: Path) -> None: """Save a JSON-serializable object to a file with UTF-8 encoding.""" path.write_text(json.dumps(obj, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")

```

Messy Script: ``` #!/usr/bin/env python3 """ A clean, well-structured example Python script.

__version__ = "1.0.0"

@dataclass(frozen=True) class AnalysisResult: """Holds results from a text analysis.""" token_counts: Dict[str, int] total_tokens: int

    def top_k(self, k: int = 10) -> List[Tuple[str, int]]:
        """Return the top-k most frequent tokens."""
        return sorted(self.token_counts.items(), key=lambda kv: (-kv[1], kv[0]))[:k]

def _read_text(path: Path) -> str: """Read UTF-8 text from a file.""" data = path.read_text(encoding="utf-8", errors="replace") return data

@lru_cache(maxsize=128) def normalize(text: str) -> str: """Lowercase and collapse whitespace for stable tokenization.""" text = text.lower() text = re.sub(r"\s+", " ", text).strip() return text

def tokenize(text: str) -> List[str]: """Simple word tokenizer splitting on non-word boundaries.""" return [t for t in re.split(r"\W+", normalize(text)) if t]

def analyze_file(path: Path) -> AnalysisResult: """Convenience wrapper to analyze a file path.""" return analyze(_read_text(path))

```

icemanx · 35m ago

Would be amazing to have a CLI tool that detects AI generated code (even add it as part of CI/CD pipelines). I'm tired of all the AI trash PRs

mannicken · 32m ago

Only Python, TypeScript and JavaScript? Well there go my vibe-coded elisp scripts.

I guess it's impossible (or really hard) to train a language-agnostic classifier.

Reference, from your own URL here: https://www.span.app/introducing-span-detect-1

henryl · 18m ago

It's probably impossible to detect ALL languages without training for them specifically, but there's good generalization happening. Our model is a unified model rather than a separate model per language. We started out with language-specific models but found that the unified approach yielded slightly better results in addition to being more efficient to train.

johnsillings · 26m ago

I'll let Henry elaborate here, but we think there's a chance that a truly language-agnostic classifier is possible. That being said, the next version of this will support a few more languages: Ruby, C#, and Java.

samfriedman · 59m ago

Accuracy is a useless statistic: give us precision and recall.

henryl · 52m ago

Recall 91.5, F1 93.3

LPisGood · 25m ago

Useless is perhaps a but harsh. It tells you something.

dymk · 12m ago

It tells me nothing because it doesn’t say if they mean precision or recall

LPisGood · 7m ago

It very much tells you something. Accuracy is a measure of overall correctness. Accuracy is something different than precision and recall.

Alifatisk · 27m ago

Very cool piece of tech, I would suggest putting C on the priority list and then Java. Mainly because Unis and Colleges use one of them or both, so that would be a good use case

johnsillings · 25m ago

Totally – we have support for Java, C#, and Ruby in the works.

Edit: since you mentioned universities, are you thinking about AI detection for student work, e.g. like a plagiarism checker? Just curious.

p0w3n3d · 10m ago

I wonder how many false positives it has

pella · 37m ago

  > Our AI Code Detector is powered by span-detect-1, a state-of-the-art model trained on millions of AI- and human-written code samples.

If I understand correctly, it is always worth using the latest and best model (for example, the newly released GPT-5-Codex), since the code generated by it is not yet included in the training database?

JohnFriel · 53m ago

This is interesting. Do you know what features the classifier is matching on? Like how much does stuff like whitespace matter here vs. deeper code structure? Put differently, if you were to parse the AI and non-AI code into AST and train a classifier based on that, would the results be the same?

henryl · 50m ago

Candidly, it's a bit of a black box still. We hope to do some ablation studies soon, but we tried to have a variety of formatting and commenting styles represented in both training and evaluation.

johnsillings · 52m ago

sharing the technical announcement here (more info on evaluations, comparison to other models, etc): https://www.span.app/introducing-span-detect-1

mechen · 28m ago

As a leader this is actually really neat - going to give it a spin

johnsillings · 25m ago

Really appreciate it!

jensneuse · 52m ago

Could I use this to iterate over my AI generated code until it's not detectable anymore? So essentially the moment you publish this tool it stops working?

well_actulily · 48m ago

This is essentially the adversarial generator/discriminator set-up that GANs use.

henryl · 51m ago

I'm sure you can but there isn't really an adversarial motive for doing that, I would think :)

polynomial · 22m ago

Sure there is.

jjmarr · 8m ago

You're saying "Understand and report on impact by AI coding tool". How is that possible?

Also, what's the pricing?

mechen · 25m ago

Just tried it out and it works :mind-blown:

Ndotkess · 52m ago

What is your approach to measuring accuracy?

johnsillings · 51m ago

I'm sure Henry will chime in here, but there's some more info here in the technical announcement: https://www.span.app/introducing-span-detect-1

"span-detect-1 was evaluated by an independent team within Span. The team’s objective was to create an eval that’s free from training data contamination and reflecting realistic human and AI authored code patterns. The focus was on 3 sources: real world human, AI code authored by Devin crawled from public GitHub repositories, and AI samples that we synthesized for “brownfield” edits by leading LLMs. In the end, evaluation was performed with ~45K balanced datasets for TypeScript and Python each, and an 11K sample set for TSX."

henryl · 52m ago

More details about how we eval'ed here:

https://www.span.app/introducing-span-detect-1

bigyabai · 1h ago

I can detect AI-generated code with 100% accuracy, provided you give me an unlimited budget for false positives. It's a bit of a useless metric.

henryl · 1h ago

I'd argue that knowing AI generated code shipped into production is the first step to understanding the impact of AI coding assistants on velocity and quality. When paired with additional context, it can help leaders understand how to improve proficiency around these tools.

jfarina · 56m ago

That's not relevant to the comment you replied to.

henryl · 49m ago

Ah - I misread:

Recall 91.5, F1 93.3

Show HN: AI Code Detector – detect AI-generated code with 95% accuracy (code-detector.ai)

Show HN: Clean Clode – Clean Messy Terminal Pastes from Claude Code and Codex (cleanclode.com)

Show HN: Pyproc – Call Python from Go Without CGO or Microservices (github.com)

Show HN: Alyx, a caffeine tracker with no accountability (alyxcaffeinetracker.com)

Show HN: Universal single-letter project commands to speed up your CLI workflow (github.com)

Show HN: A store that generates products from anything you type in search (anycrap.shop)

Show HN: I wrote a from-scratch OS to serve my blog (github.com)

Show HN: I reverse engineered macOS to allow custom Lock Screen wallpapers (cindori.com)

Show HN: Daffodil – Open-Source Ecommerce Framework to connect to any platform (github.com)

Show HN: AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers (github.com)

Show HN: Omarchy on CachyOS (github.com)

Show HN: Semlib – Semantic Data Processing (github.com)

Show HN: Drop-in Redis replacement in Rust with 5M+ GET/s (github.com)

Show HN: Dagger.js – A buildless, runtime-only JavaScript micro-framework (daggerjs.org)

Show HN: MCP Server Installation Instructions Generator (hyprmcp.com)

Show HN: Datadef.io – Canvas for data lineage and metadata management (datadef.io)

Show HN: Ruminate – AI reading tool for understanding hard things (tryruminate.com)

Show HN: Small Transfers – charge from 0.000001 USD per request for your SaaS (smalltransfers.com)

Show HN: I built an app store for open-source financial plans (on spreadsheets) (finfam.app)

Show HN: Blocks – Dream work apps and AI agents in minutes (blocks.diy)

Show HN: HN Term – browse HN using the terminal (github.com)

Show HN: Vicinae – A native, Raycast-compatible launcher for Linux (github.com)

Show HN: InfiniteTalk AI – AI Lip-Sync Video Generator for Long Videos (infinitetalk.net)

Show HN: I made a generative online drum machine with ClojureScript (dopeloop.ai)

Show HN: Open Line Protocol – a minimal wire for AI agents (MIT) (github.com)

Show HN: Term.everything – Run any GUI app in the terminal (github.com)

Show HN: Ultraplot – A succint wrapper for matplotlib (github.com)

Show HN: A tool to make a bootable USB installer out of macOS, or download it (macdaddy.io)

Show HN: Labspace Directory – Biotech resource for lab space (labspacedirectory.com)

Show HN: Building a Deep Research Agent Using MCP-Agent (thealliance.ai)

Show HN: CLAVIER-36 – A programming environment for generative music (clavier36.com)

Show HN: Httpjail – HTTP(s) request filter for processes (github.com)

Show HN: A Daily Typing Challenge in the TUI (github.com)

Show HN: Helios, an open-source distributed AI network using idle community GPUs (github.com)

Show HN: TailGuard – Bridge your WireGuard router into Tailscale via a container (github.com)

Show HN: Allzonefiles.io – download 307M registered domain names (allzonefiles.io)

Show HN: Bottlefire – Build single-executable microVMs from Docker images (bottlefire.dev)

Show HN: PaperSync, making ArXiv papers collaborative (hackcmu25.vercel.app)

Show HN: Building an open-source agentic terminal (davehudson.io)

Show HN: Making a cross-platform game in Go using WebRTC Datachannels (pion.ly)

Show HN: Pooshit – Sync local code to remote Docker containers

Show HN: C++ Compiler Support Page (cppstat.dev)

Show HN: Interactive news headline generator compatible with i3/sway (github.com)

Show HN: Haystack – Review pull requests like you wrote them yourself (haystackeditor.com)

Show HN: An MCP Gateway to block the lethal trifecta (github.com)

Show HN: Aris – a free AI-powered answer engine for kids (aris.chat)

Show HN: Demochain, a toy blockchain network that runs on the browser (github.com)

Show HN: GitHub repo with 180 tools for investing (github.com)

Show HN: YC Startup Map – A Map Visualization of the YC Startup Directory (ycstartupmap.com)

Show HN: I made pgdbtemplate to cut PostgreSQL test time by 1.5x using templates (github.com)

Show HN: AI Code Detector – detect AI-generated code with 95% accuracy

Comments (36)