Letting the AIs Judge Themselves: A One Creative Prompt: The Coffee-Ground Test

2 tamtampo 2 5/19/2025, 1:39:17 AM tryaii.com ↗

Comments (2)

tamtampo · 9h ago
I work on the best way to bemchmark todays LLM's and i thought about diffrent kind of compettion.

*Why I Ran This Mini-Benchmark* I wanted to see whether today’s top LLMs share a sense of “good taste” when you let them score each other, no human panel, just pure model *democracy*.

The Setup One prompt - Let the decide and score each other (anonimously), the highest score overall wins.

*Models tested (all May 2025 endpoints)*

* *OpenAI o3* * *Gemini 2.0 Flash* * *DeepSeek Reasoner* * *Grok 3 (latest)* * *Claude 3.7 Sonnet*

*Single prompt given to every model:*

In exactly *10* words, propose a groundbreaking global use for spent coffee grounds. Include *one* emoji, no hyphens, end with a period.

fdefitte · 9h ago
Love it. AI is actually really better at judging the quality of content than it is at producing content. Kind of like humans actually :)