Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

27 mrqjr 5 6/29/2025, 3:33:57 PM llmapitest.com ↗

I recently built a small open-source tool to benchmark different LLM API endpoints — including OpenAI, Claude, and self-hosted models (like llama.cpp).

It runs a configurable number of test requests and reports two key metrics: • First-token latency (ms): How long it takes for the first token to appear • Output speed (tokens/sec): Overall output fluency

Demo: https://llmapitest.com/ Code: https://github.com/qjr87/llm-api-test

The goal is to provide a simple, visual, and reproducible way to evaluate performance across different LLM providers, including the growing number of third-party “proxy” or “cheap LLM API” services.

It supports: • OpenAI-compatible APIs (official + proxies) • Claude (via Anthropic) • Local endpoints (custom/self-hosted)

You can also self-host it with docker-compose. Config is clean, adding a new provider only requires a simple plugin-style addition.

Would love feedback, PRs, or even test reports from APIs you’re using. Especially interested in how some lesser-known services compare.

Comments (5)

swyx · 2h ago

idk what it is but buying that domain made it seem more commercial and therefore less trustworthy. also most people prob want to just use artificialanalysis' numbers rather than self run benchmarks (but this is ok if want to run your own)

mdhb · 5h ago

In what universe is a post created by a new account with zero comments and a grand total of 2 votes over the course of 2 hours doing on the front page?

bdangubic · 36m ago

I am polishing up my blog about some FORTRAN code I wrote last week in hopes of the same :)

vntok · 49m ago

It's an informative post about new tech, that fits pretty well here of all places.

Why would you want the author to write about something else to validate the post? That would be an appeal to authority, which is the complete opposite of what the Hacker Manifesto has always been about in terms of ethos, goals, etc.

iRomain · 4h ago

LLM

Ask HN: What Are You Working On? (June 2025)

Tell HN: (dictionary|thesaurus).reference.com is now a spam site

Ask HN: What do use for private service monitoring?

Ask HN: What Happened to James Halliday ( Substack)?

Ask HN: Is it possible to generate usable energy from environmental heat?

A literary magazine accessible only via telnet

Ask HN: Anyone using augmented reality, VR, glasses, helmets etc. in industry?

A reverse-delta backup strategy – obvious idea or bad idea?

Ask HN: Startup shutting down, should we open source?

Ask HN: Will MCP replace GUI interacting with back end via RESTful APIs?

Ask HN: Better-auth or Nextauth or something else

How do you handle production webhook delivery reliability in your apps?

Ask HN: Would this idea help address declining populations in many countries?

Ask HN: Why aren't AIs being used as app beta testers yet?

Save Your Future: A Realist's Guide to Cyberpunk Survival

Ask HN: LLM Assisted Vim Workflows?

Ask HN: Why does my Node.js multiplayer game lag at 500 players with low CPU?

Ask HN: Alternatives to Cloudflare for DNS?

The 90% Gravity Problem: Why We Tend to Quit Right Before the Finish Line

Tell HN: Meta developer account suspended

Ask HN: Anyone interested in taking over my indie app?

Ask HN: What's the Best AI Browser Automation Solution?

Ask HN: How would expose a scam involving a powerful figure?

The scam that is Visa Account Updater

Lufin – let's upload that file–next – a next generation LUFI E2EE filesharing

Ask HN: If you translate with LLMs, GT or DeepL–what features are missing?

Ask HN: How do you keep your SWE skills sharp outside of work?

Ask HN: Is anyone else just done with the industry?

Ask HN: How Do You Actually Use Claude Code Effectively?

Ask HN: Are there any tech companies not on the AI bandwagon

Ask HN: Why are only CEOs building products at record speeds

Ask HN: Is There is Any Open Access Research Aggregator That Has RSS?

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

Comments (5)