I wrote 2000 LLM test cases so you don't have to: LLM feature compatibility grid

2 scosman 2 7/22/2025, 3:14:58 PM getkiln.ai ↗

Comments (2)

Oras · 6h ago
I might have missed the point, but most of these features are just filters in OpenRouter.

- Reasoning.

- Structured Output.

- Logprops

What's the added value from your tests? To verify these features exist?

scosman · 6h ago
There's a section about Openrouter/LiteLLM: https://getkiln.ai/blog/i_wrote_2000_llm_test_cases_so_you_d...

Those tools map API compatibility. These tests+config add:

1) check which features are available

2) check which parameters you need to use for best results. For example, there are about 6 different options for requesting JSON from OpenRouter, and different models work best with different options.

3) check that the features consistently work. API compatibility and functionality are not the same.

4) Go much deeper: are the models good enough for synthetic data generation? Can they generate uncensored model inputs if you're building a toxicity eval? etc.