Show HN: prompttest – pytest for LLMs
Every time I tweaked a prompt, I had to rerun a bunch of test cases manually and eyeball the results.
It felt like writing code without unit tests.
Existing tools I found were either:
- Full frameworks where you write evaluators in Python.
- Big platforms expanding into monitoring/security.
I wanted something simpler: a fast CLI that just tests prompts.
So I built prompttest, a pytest-like workflow for LLMs:
- You define a prompt in a .txt file with `{variables}`.
- You write test cases in .yml, with plain-English criteria.
- You run prompttest to see clear pass/fail results in your terminal.
The core idea is that your "assertion" is just English.
Example:
> The response must be polite and address the user by name.
Then a model grades the output for you.
This gives a safety net — you can refactor prompts and instantly see regressions.
The project is still early.
It runs on OpenRouter, so you can test against many models (including free ones) with one API key.
Would love feedback, ideas, or use-cases you’d want supported.
No comments yet