Self-Cleaning Glass via Abnormal Transport and Jump of Charged Particles (advanced.onlinelibrary.wiley.com)

Last week we made it to the front page with our post about benchmarking how well coding agents interact with libraries and APIs. The response was positive overall, but many wanted to see the code.

For those just catching up: The problem is that existing benchmarks focus on self-contained codegen. StackBench tests how well AI coding agents (like Claude Code, and now Cursor) use your library by: • Parsing your documentation automatically • Extracting real usage examples • Having agents generate those examples from a spec from scratch • Logging every mistake and analyzing patterns

You can find out more information about how it works and how to run it in the docs https://docs.stackbench.ai/

Next up, we’re planning to add more: • Coding agents • Ways of providing docs as context (e.g. Mintlify vs Cursor doc search) • Benchmark tasks (e.g. use of APIs via API docs) • Metrics

We're also working on automating in-editor testing and maybe even using an MCP server.

Contributions and suggestions very welcome. What should we prioritize next? The issues tab is open.

danmaw · 12h ago

Super cool. Thanks for sharing this. Who is this mainly aimed at, the maintainers or the users of the libs?

richardblythman · 12h ago

Mainly at the maintainers. In the generated reports, we highlight issues and suggest improvements to the docs (working on improving the reports as we run across more libraries).

gerstep · 12h ago

Cool direction! benchmarking agent library usage instead of pure codegen is what’s missing

MacsHeadroom · 10h ago

Very cool. I tried the web version last week.

dosdos · 12h ago

Very cool

Certificates for Onion Services (onionservices.torproject.org)

HTML5 Physical Presence Detection with Sound (Warning: Volume Down First) (codesandbox.io)

Denmark issues first apology over forced contraception of Greenlandic women (theguardian.com)

Show HN: I fine-tuned GPT4.1 on my iMessage history (jonyork.net)

The New Framework Laptop 16 with Nvidia GeForce RTX 5070 (frame.work)

Two Female Pilots Do the Most Dangerous Approach – Paro Airport, Bhutan [video] (youtube.com)

Show HN: Yes, another boring AI Image Editor (pixfy.io)

Whispers from the Star – AI Interactive Story Game (wfts.anuttacon.com)

The New York Times Mini Crossword Is No Longer Free to Play (bookriot.com)

Sustainable Energy – Without the Hot Air (withouthotair.com)

Conversational BI for Data and Documents (docs.google.com)

Winston Churchill addresses the nation following defeat of Germans (1945) [video] (youtube.com)

Google Ironwood TPU (servethehome.com)

Show HN: Pocket Agent: run Claude, Cursor, Codex and more from your phone (pocket-agent.xyz)

Facial recognition technology: When your face becomes a commodity (proton.me)

Why AI Isn't Ready to Be a Real Coder (spectrum.ieee.org)

Canaries in the Coal Mine? Recent Employment Effects of AI [pdf] (digitaleconomy.stanford.edu)

Use Txt for Bookmarks (github.com)

White House fires CDC director Monarez after she refuses to resign (cnbc.com)

The Medicine We Thought Was Safe (domofutu.substack.com)

Show HN: Created a Node.js's addon that can handle 1M req/s

MSG150: Blogging Seattle International District Lunch Food (2012) (msg150.com)

Self-Cleaning Glass via Abnormal Transport and Jump of Charged Particles (advanced.onlinelibrary.wiley.com)

Ask HN: Why isn't my ISP providing AI as a service?

TikTok owner set to launch share buyback valuing company at $330B (theguardian.com)

Open Source is one person (opensourcesecurity.io)

"Buy Now, Pay Later" Seduced a Generation–and Trapped It in Debt (thewalrus.ca)

The Coso Artifact: Mystery from the Depths of Time (2018) (talkorigins.org)

Show HN: Solana KOL wallet tracker Pumptracker.io (pumptracker.io)

The San Francisco Government Visualized (sfgov.civlab.org)

Created an app for Googe Docs called DocReader (geniusaddons.com)

CDC officials’ resignation emails (insidemedicine.substack.com)

How to Drill Your Own Water Well (drillyourownwell.com)

Ask HN: Anyone have a place to crash for the night in San Francisco?

Strongly Typed? (dotat.at)

Learning Facts at Scale with Active Reading (arxiv.org)

Show HN: Karton is a simple, type-safe RPC and state-syncing framework (OSS,MIT) (github.com)

Show HN: Show HN: Geotagged Photo Map (ethan.dev)

Treated vegetable oils to green Singapore's data centres (businesstimes.com.sg)

Show HN: AI-powered video analysis tool that generates 800 word content prompts (video2prompt.org)

Trunk: Our Choice for Linting TF Code (newsletter.masterpoint.io)

Localhost: Peter Whidden's Interactive Ecosystem Simulation: Mote (youtube.com)

Show HN: Open-Source] Deep Research Assistant Built Solely for Gemini API (github.com)

Code Surgery: How AI Assistants Make Precise Edits to Your Files (fabianhertwig.com)

Gates Foundation Cuts Ties with Firm Linked to Democrats (nytimes.com)

Show HN: Multi-Scene Full 3D Context from CCTV (customer-ch4p4zaety6us2rk.cloudflarestream.com)

The National Design Studio is a scam (chrbutler.com)

Uncertain⟨T⟩ (nshipster.com)

Music to Break Models By (matthodges.com)

Show HN: Paletra – Build WCAG ready color palettes and test them on components (paletra.cc)

StackBench Is Now Open Source

Comments (6)