Behind Closed Doors: The Inequalities Hiding in Modern Student Housing (joshuatravisbrown.substack.com)

Semgrep recently published an analysis of how LLMs perform at spotting vulnerabilities in code: https://semgrep.dev/blog/2025/finding-vulnerabilities-in-modern-web-apps-using-claude-code-and-openai-codex/

I’ve been thinking about this problem and wanted to share a perspective.

When evaluating LLMs for static analysis, I see four main dimensions: accuracy, coverage, context size, and cost.

On accuracy and coverage, today’s LLMs feel nowhere close to replacing dedicated SAST tools on real-world codebases. They do better on isolated snippets or smaller repos, but once you introduce deep dependency chains, results drop off quickly.

Context size is another bottleneck. Feeding an LLM a repo with millions of lines creates huge problems for reasoning across files, and the runtime gets impractical.

That leads to cost. Running an LLM across a massive codebase can be significantly more expensive than traditional scanners, without obvious ROI.

Where they do shine is at smaller scales — reviewing PRs, surfacing potential issues in context, or even suggesting precise fixes when the input is well-scoped. That seems like the most practical application right now. Whether providers will invest in solving the big scaling problems is still an open question.

Curious how others here think about the trade-offs between LLM-based approaches and existing SAST tools.

Comments (3)

aafanah · 18m ago

Interesting. LLMs are already shining at PR reviews even if they struggle with massive codebases right now. And they are evolving fast enough that those scaling limits might not stay limits much longer.

kogatlas · 11m ago

I'd love to see your evidence that "LLMs are already shining at PR reviews". We've used a handful of them here where I work for months now and they are rarely correct, and thus, rarely useful. Instead they tend to just summarize nonsense that wasn't even introduced in that PR, make shit up entirely, or recommend bad fixes to things that would be better solved by being removed entirely.

aafanah · 1m ago

Fair point. I think the bottom line is that it depends a lot on the context and how the prompt is framed. For PRs with small enough scope, I have seen LLMs provide decent value, mostly in surfacing potential issues or offering quick summaries. That said, the Semgrep analysis highlights that accuracy and coverage still fall short even in these narrow cases, so clearly there is still a lot of work to be done before this becomes broadly reliable.

Earth's seasons in all their complexity in a new animated map (theconversation.com)

Thagomizer (en.wikipedia.org)

Context Engineering: Rapid Agent Prototyping – Jason Liu (jxnl.co)

Wonderful Medieval Illuminated Manuscripts (2021) (thecollector.com)

Lidar/Phone Camera PSA (twitter.com)

Show HN: JSONeer, a Platform for Creating and Fetching JSONs Effortlessly (jsoneer.dev)

Darth Vader's Lightsaber Sets New Sale Record at Sci-Fi Movie Auction (gizmodo.com)

Tech leaders take turns flattering Trump at White House dinner (theverge.com)

Why Wikipedia Works (pluralistic.net)

Oracle cuts hundreds more Bay Area jobs in latest round of layoffs (kron4.com)

An Update on New Computer and Dot (new.computer)

First CRISPR horses spark controversy: what's next for gene-edited animals? (nature.com)

ETH Liquidity Check: Is It Catching Up with Bitcoin? (coindesk.com)

US to cut some security funds for European countries bordering Russia (ft.com)

Are LLMs better suited for PR reviews than full codebases?

Behind Closed Doors: The Inequalities Hiding in Modern Student Housing (joshuatravisbrown.substack.com)

Is 'The Wizard of Oz' at Sphere the Future of Cinema? Or the End of It? (nytimes.com)

The Most Annoying Captcha (claude.ai)

Does the Future Belong to China? (nytimes.com)

3D-Printed Scaffolds Promote Organoid Regrowth in Spinal Cord Injury (advanced.onlinelibrary.wiley.com)

'One and done' dose of LSD keeps anxiety at bay (npr.org)

The Equal-Custody Experiment (wsj.com)

New evidence on sinking of WW2 battleship HMS Hood (youtube.com)

Moravec's Paradox (en.wikipedia.org)

Dylan: An object-oriented dynamic language (lispm.de)

A Global Perspective on Local Sea Level Changes (mdpi.com)

Why Is Japan Still Investing in Custom Floating Point Accelerators? (nextplatform.com)

Philips introduces budget-friendly Hue bulbs as part of major lineup overhaul (arstechnica.com)

Make Something Wonderful (Steve Jobs Archive) (stevejobsarchive.com)

Rust for Big Data: How We Built a MPP Query Executor on S3 from Scratch (databend.com)

MinionS Protocol – Cost-Efficient Local-Remote LLM Collaboration (github.com)

What to Build?

Minimalist Mandelbrot Set (johndcook.com)

Five Minute Sprints (dsriseah.com)

Shifting Bits in Company History (williamyeny.github.io)

Models of (Dependent) Type Theory (bartoszmilewski.com)

Legal Issues Raised by a Lethal U.S. Military Attack in the Caribbean (justsecurity.org)

Interposing on clone() system calls in-process, from Linux userspace (humprog.org)

Multithreaded Radix Sort Implementation Walkthrough [video] (rfleury.com)

How many SPARCs is too many SPARCs? (thejpster.org.uk)

Open Footwear (openfootwear.com)

Show HN: Feedable – Make any site with RSS readable without installing an app (feedable.doublememory.com)

South Korean company workers arrested at Hyundai site in Georgia (wsj.com)

Eating insects became a conspiracy theory (bbc.com)

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul (modular.com)

Fantastic Pretraining Optimizers and Where to Find Them (arxiv.org)

OpenAI hires the team behind Xcode coding assistant Alex (techcrunch.com)

GitHub Spec Kit (github.com)

The Battle to Protect Time (ft.com)

Small but mighty: A seed-inspired monocopter idea takes flight (techxplore.com)

Are LLMs better suited for PR reviews than full codebases?

Comments (3)