Show HN: TheAuditor – Offline security scanner for AI-generated code
Why I built this: I needed a way to verify AI-generated code was production-safe. Existing tools either required cloud uploads (privacy concern) or produced output too large for AI context windows. TheAuditor solves both problems - it runs completely offline and chunks findings into 65KB segments that fit in Claude/GPT-4 context limits.
What I discovered: Testing on real projects, TheAuditor consistently finds 50-200+ vulnerabilities in AI-generated code. The patterns are remarkably consistent: - SQL queries using f-strings instead of parameterization - Hardcoded secrets (JWT_SECRET = "secret" appears in nearly every project) - Missing authentication on critical endpoints - Rate limiting using in-memory storage that resets on restart
Technical approach: TheAuditor runs 14 analysis phases in parallel, including taint analysis (tracking data from user input to dangerous sinks), pattern matching against 100+ security rules, and orchestrating industry tools (ESLint, Ruff, MyPy, Bandit). Everything outputs to structured JSON optimized for LLM consumption.
Interesting obstacle: When scanning files with vulnerabilities, antivirus software often quarantines our reports because they contain "malicious" SQL injection patterns - even though we're just documenting them. Had to implement pattern defanging to reduce false positives.
Current usage: Run aud full in any Python/JS/TS project. It generates a complete security audit in .pf/readthis/. The AI can then read these reports and fix its own vulnerabilities. I've seen projects go from 185 critical issues to zero in 3-4 iterations.
The tool is particularly useful if you're using AI assistants for production code but worry about security. It provides the "ground truth" that AI needs to self-correct.
Would appreciate feedback on: - Additional vulnerability patterns common in AI-generated code - Better ways to handle the antivirus false-positive issue - Integration ideas for different AI coding workflows
Thanks for taking a look! /TheAuditorTool
> I've built the tool that makes AI assistants production-ready. This isn't competing with SonarQube/SemGrep. This is creating an entirely new category: AI Development Verification Tools.
Wow, that's a lot of talk for a tool that does regex searches and some AST matching, supporting only python and js (these things are not mentioned in the main project README as far as I can tell?).
The actual implementation details are buried in an (LLM written?) document: https://github.com/TheAuditorTool/Auditor/blob/main/ARCHITEC...
My favourite part is the "Pipeline System", which outlines a "14-phase analysis pipeline", but does not number these stages.
It reads a bit like the author is hiding what the tool actually does, which is sad, because there might be some really neat ideas in there, but they are really hard to make out.
There is this check [here](https://github.com/TheAuditorTool/Auditor/blob/2a3565ad38ece...), labelled "Time-of-check-time-of-use (TOCTOU) race condition pattern".
It reads:
`if.\b(exists?|has|contains|includes)\b.then.*\b(create|add|insert|write)\b`
This matches any line that contains `if` followed by `has` followed by `then` followed by `add`, for example. This is woefully insufficient for actually detecting TOCTOU, and even worse, will flag many many things as false positives.
Now the real problem is, that the author states that this will solve all your problems (literally), providing a completely false sense of security...
That TOCTOU pattern IS terrible - it's meant as a last-resort 'something might be wrong here' flag when we can't parse the AST. The real detection happens in theauditor/taint_analyzer/ which tracks actual data flow from filesystem checks to file operations.
But you're right - even fallback patterns shouldn't be this noisy. I'll tighten it to only flag actual filesystem operations: - os.path.exists → open() - fs.exists → fs.writeFile() - File.exists() → new FileWriter()
TheAuditor uses a sandboxed environment (.auditor_venv/) to avoid polluting your system. When Tree-sitter isn't properly installed in that sandbox, we fall back to regex patterns. Common causes:
1. Missing C compiler - Tree-sitter needs to compile language grammars 2. Incomplete setup - User didn't run aud setup-claude --target . which installs the AST tools 3. Old installation - Before we fixed the [ast] dependency inclusion
If your code had syntax errors, you'd get different errors entirely (and your code probably wouldn't run). The "AST parsing fails" message specifically means Tree-sitter isn't available, so we're using the fallback regex patterns instead.
Just pushed clearer docs about this today actually. Run aud setup-claude --target . in your project and Tree-sitter should work properly.
This perfectly illustrates why I need community input. I'm not a developer - I literally can't code. I built this entire tool using Claude over 250 hours because I needed something to audit the code that Claude was writing for me. It's turtles all the way down!
The "14 phases" you mentioned are in theauditor/pipelines.py:_run_pipeline(): - Stage 1: index, framework_detect - Stage 2: (deps, docs) || (patterns, lint, workset) || (graph_build) - Stage 3: graph_analyze, taint, fce, consolidate, report
The value isn't in individual patterns (which clearly need work), but in the correlation engine. Example: when you refactor Product.price to ProductVariant.price, it tracks that change across your entire stack - finding frontend components, API calls, and database queries still using the old structure. SemGrep can't do this because it analyzes files in isolation.
You're 100% correct that I oversold it with "solves ALL your problems" - that's my non-developer enthusiasm talking. What it actually does: provides a ground truth about inconsistencies in your codebase that AI assistants can then fix. It's not a security silver bullet, it's a consistency checker.
The bad patterns like that TOCTOU check need fixing or removing. Would you be interested in helping improve them? Someone with your eye for detail would make this tool actually useful instead of security theater.
I use AI to communicate because I have dyslexia and ADHD. It helps me articulate technical concepts clearly. The irony isn't lost on me - I built a tool to audit AI-generated code, using AI, because I can't code, and now I'm using AI to explain it.
If that offends you more than 204 SQL injections in production code, we have different priorities.
The 204 SQL injections it found in production? Those were real. Those are produced by industry standard tools....
The nightmare isn't that I used AI to build a security tool. The nightmare is that your production code was probably written the same way.
At least I'm checking mine.
https://github.com/TheAuditorTool/Auditor/commit/f77173a5517...
That said? Did you the hash fail? Yes it did, security working as intended... Anything more to add? :)
That's exactly why I built TheAuditor - because I DON'T trust the code I had AI write. When you can't verify code yourself, you need something that reports ground truth.
The beautiful irony: I used AI to build a tool that finds vulnerabilities in AI-generated code. It already found 204 SQL injections in one user's production betting site - all from following AI suggestions.
If someone with no coding ability can use AI + TheAuditor to build TheAuditor itself (and have it actually work), that validates the entire premise: AI can write code, but you NEED automated verification.
What could go wrong? Without tools like this, everything. That's the point.
There is no difference between human made and AI made bad code, so I don't think we need specialized tools for that.
Pick one.
SonarQube IS a specialized tool. It just specializes in different things than TheAuditor.
SonarQube: "This file has issues" heAuditor: "Your frontend and backend disagree about the data model"
Both have their place.
If this was used before releasing the tea spill fiasco, only to name one? It would never have been a fiasco. Just saying..
That's a strange ask in the Python ecosystem - what's the reason for this?
Also, what's the benefit of ESLint/Ruff/MyPy being utilised by this audit tool? I'm not sure I understand the benefit of having an LLM in between you and Ruff, for example.
It's breathtaking how much of an enabler it already is, but curating a good dependency tree and staying within scope of the outlined work to do are not things LLMs are good at, currently.
The ESLint/Ruff/MyPy integration isn't about putting an LLM between you and linters. It's about aggregation and correlation. Example: - Ruff says "unused import" - MyPy says "type mismatch" - TheAuditor correlates: "You removed the import but forgot to update 3 type hints that depended on it"
The LLM reads the aggregated report to understand the full picture across all tools, not just individual warnings.
@ffsm8: You're absolutely right - I can't code and the dependency tree is probably a mess! That's exactly WHY I built this. When you're using AI to write code and can't verify if it's correct, you need something that reports the ground truth.
The irony isn't lost on me: I used Claude to build a tool that audits code written by Claude. It's enablement all the way down! But that's also the proof it works - if someone who can't code can use AI + TheAuditor to build TheAuditor itself, the development loop is validated.
The architectural decisions might be weird, but they're born from necessity, not incompetence. Happy to explain any specific weirdness!