Show HN: Multi-Agent-Coder Is #12 on Stanford's TBench. Beats Claude Code

3 Danau5tin 0 9/3/2025, 8:04:07 AM github.com ↗

This weekend I built a multi-agent coding system which, quite unexpectedly, beat Claude Code on Stanford's Terminal Bench!

The architecture is straightforward, consisting of an orchestrator agent that deploys explorer & coder subagents to complete complex terminal based tasks, utilising an intelligent context sharing mechanism along the way which makes it all work.

The repo has a lot of technical details, and all the code and prompts for you to play around with if you'd like!

I had a lot of fun making this, I hope you have fun reading the README, using it yourself, or even extending it!

As always, a huge thanks to the great team behind Terminal Bench. It is a great benchmark.

Thanks for reading, Dan

Show HN: VoiceGecko – System-wide voice-to-text that types anywhere (voicegecko.io)

Show HN: Text2SQL with a Graph Semantic Layer (github.com)

Show HN: LightCycle, a FOSS game in Rust based on Tron (github.com)

Show HN: Amber – better Beeper, a modern all-in-one messenger (useamber.app)

Show HN: Moribito – A TUI for LDAP Viewing/Queries (github.com)

Show HN: We built an open-source alternative to expensive pair programming apps (github.com)

Show HN: My first Go project, a useless animated bunny sign for your terminal (github.com)

Show HN: A hacky app for location sharing without suirvellance (fyrspot.app)

Show HN: Multi-Agent-Coder Is #12 on Stanford's TBench. Beats Claude Code (github.com)

Show HN: Davia – A community platform to build, share, and edit applications (docs.davia.ai)

Show HN: I built an AI that uses a metacognitive loop 2 solve invention problems (robw1se.substack.com)

Show HN: Tail Lens – Visually edit tailwind css dev tool

Show HN: Fst – Lightweight C utility for detailed directory statistics LGPL 3.0

Show HN: Simple modenized .NET NuGet server reached RC (github.com)

Show HN: I built a deep research tool for local file system (github.com)

Show HN: Spotilyrics – See synchronized Spotify lyrics inside VS Code (github.com)

Show HN: An ncurses CUDA-based fluid simulation (github.com)

Show HN: slack-explorer-mcp – Let AI find historical context in Slack (github.com)

Show HN: Woomarks, transfer your Pocket links to this app or self-host it (woomarks.com)

Show HN: Provably secure vibe coding is now a thing (secureaf.lovable.app)

Show HN: Lightweight server-driven template language for JavaScript (github.com)

Show HN: Hacker News em dash user leaderboard pre-ChatGPT (gally.net)

Show HN: Unity WebGL Playground (onejs.com)

Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts (bilawal.net)

Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown (sosumi.ai)

Show HN: Anonymous Age Verification (gist.github.com)

Show HN: Open-source AI writing your javadoc (deviantabstraction.com)

Show HN: Whodunit – Solve AI written mysteries (whodunit.rip)

Show HN: Promptproof – GitHub Action to test LLM prompts, catch bad JSON schemas (github.com)

Show HN: MCP Secrets Vault – Local MCP proxy to keep API keys out of LLM context (github.com)

Show HN: Blueprint: Fast, Nunjucks-like templating engine for Java 8 and beyond

Show HN: PasteVault – An open-source, E2EE pastebin with a VS Code-like editor (pastevault.dev)

Show HN: StoryMotion, hand-drawn motion graphics editor based on Excalidraw (storymotion.video)

Show HN: Forward Error Correction for Pion WebRTC (pion.ly)

Show HN: Ruby-TI mruby type analyser (github.com)

Show HN: Neuron – Cognitive Multi-Agent Architecture for Reasoning

Show HN: Zyg – Stop Writing Status Updates (zyg.sh)

Show HN: A founder community with true anonymity(HMAC identities,no socialgraph) (foundermood.zorentia.com)

Show HN: Find Hidden Gems on HN (pj4533.com)

Show HN: I made an Animal Crossing style letter editor (acmail.idreesinc.com)

Show HN: Meetup.com and eventribe alternative to small groups (github.com)

Show HN: I integrated my from-scratch TCP/IP stack into the xv6-riscv OS (github.com)

Show HN: Turn Markdown into React/Svelte/Vue UI at runtime, zero build step (markdown-ui.com)

Show HN: A zoomable, searchable archive of BYTE magazine (byte.tsundoku.io)

Show HN: Base, an SQLite database editor for macOS (menial.co.uk)

Show HN: SwiftAI – open-source library to easily build LLM features on iOS/macOS (github.com)

Show HN: Auto-Match – How We Built Receipt-to-Transaction Matching (Open Source) (midday.ai)

Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated tool (github.com)

Show HN: Dream Prompter: Bringing Nano Banana to GIMP (thoughts.greyh.at)

Show HN: FilterQL – A tiny query language for filtering structured data (github.com)

Show HN: Multi-Agent-Coder Is #12 on Stanford's TBench. Beats Claude Code

Comments (0)