Show HN: I made a tiny, playable benchmark where LLMs compete head-to-head

2 yz-yu 0 8/5/2025, 9:57:49 AM llm-fighter.com ↗

TL;DR: LLM Fighter is a small, open-source, playable benchmark for agentic behavior. You bring an OpenAI-compatible API; the demo runs in the browser. It creates head-to-head “battles” that stress tools, planning, and efficiency, and shows step-by-step logs you can download.

What it does well: quick, honest feel for how agents act under the same rules. What it’s not: a formal academic benchmark or a single “score”. Why I built it: I wanted something you can play in minutes and still learn from.

Show HN: I spent 6 years building a ridiculous wooden pixel display (benholmen.com)

Show HN: I've been building an ERP for manufacturing for the last 3 years (github.com)

Show HN: I got rejected by Meta so I built an AI Recruiter to roast resumes (crackedresume.com)

Show HN: I've been building a configuration tool for Claude Code (github.com)

Show HN: Embeddable -build interactive experiences you can drop into any website (embeddable.co)

Show HN: Sidequest.js – Background jobs for Node.js using your database (docs.sidequestjs.com)

Show HN: Tiny logic and number games I built for my kids (quizmathgenius.com)

Show HN: Mathpad – Physical keypad for typing math symbols (crowdsupply.com)

Show HN: Dataset Explorer – Free tool to search any public datasets (hunch.dev)

Show HN: Kimu – Open-Source Video Editor (trykimu.com)

Show HN: FlexLLama – Run multiple local LLMs at once with a simple dashboard (github.com)

Show HN: Using DSPy to enrich a dataset of the Nobel laureate network (blog.kuzudb.com)

Show HN: A simple website to analyze mortgage called "Mortlab" (mortlab.com)

Show HN: I made a competitive debating game(like chess.com but for debating) (crs-prod-rankeddebate-l4dnggfaca-nn.a.run.app)

Show HN: Read the RFCs That Built the Internet (tech.stonecharioteer.com)

Show HN: Schematra – Sinatra-inspired minimal web framework for Chicken Scheme (github.com)

Show HN: WebGPU enables local LLM in the browser – demo site with AI chat (andreinwald.github.io)

Show HN: Draw a fish and watch it swim with the others (drawafish.com)

Show HN: I built the fastest VIN decoder (github.com)

Show HN: Gmap: Explore Git Repos Visually from the CLI (github.com)

Show HN: FFlags – Feature flags as code, served from the edge (fflags.com)

Show HN: NaturalCron – Human-Readable Scheduling for .NET (With Fluent Builder) (github.com)

Show HN: Spatial Web Browser Engine (m-creativelab.github.io)

Show HN: Mcp-use – Connect any LLM to any MCP (github.com)

Show HN: Structured Cooperation – A new way of building distributed apps & POC (github.com)

Show HN: AgentMail – Email infra for AI agents (chat.agentmail.to)

Show HN: Grant Writing AI for Nonprofits (grantboost.io)

Show HN: Wordle-style game for Fermi questions (fermiquestions.org)

Show HN: I made a website that makes you cry (cryonceaweek.com)

Show HN: An interactive dashboard to explore NYC rentals data (leaseswap.nyc)

Show HN: A tiny reasoning layer that steadies LLM outputs (MIT; +22.4% accuracy) (github.com)

Show HN: I made a platform to create Telephone Voice AI Agencies (telezen-ai.com)

Show HN: TraceRoot – Open-source agentic debugging for distributed services (github.com)

Show HN: IRC /Whois Gallery (retlehs.github.io)

Show HN: Voltpeek – Vim-inspired oscilloscope software (github.com)

Show HN: Turn impulse buys into dream investments (nopeit.app)

Show HN: Sourcebot – Self-hosted Perplexity for your codebase (github.com)

Show HN: Companies use AI to take your calls. I built AI to make them for you (pipervoice.com)

Show HN: Rewindtty – Record and replay terminal sessions as structured JSON (github.com)

Show HN: KubeForge – A GUI for Kubernetes YAMLs (github.com)

Show HN: An AI agent that learns your product and guides your users (frigade.ai)

Show HN: Pontoon – Open-source customer data syncs (github.com)

Show HN: An Infinite Wiki Simulator (wikisim.jgibbs.dev)

Show HN: A Programmer's Guide to Life (programmersguideto.life)

Show HN: Open-source alternative to ChatGPT Agents for browsing (github.com)

Show HN: Dlg – Zero-cost printf-style debugging for Go (github.com)

Show HN: GPT helped me rebuild a .NET app in 30 mins what took 3 weeks in MFC

Show HN: QuantumFlow Toolkit – An open-source framework hybrid quantum workflows (github.com)

Show HN: AI Physics Tutor with Free Body Diagrams (physicsviewer.com)

Show HN: Print the daily weather forecast on a thermal receipt printer (github.com)

Show HN: I made a tiny, playable benchmark where LLMs compete head-to-head

Comments (0)