Benchmark for Local LLMs with German "Who Wants to Be a Millionaire" Questions
2 thunderbong 1 9/3/2025, 7:07:16 AM github.com ↗
Comments (1)
mynti · 1d ago
This is super cool! One thing I find counterintuitive is that GPT5 or o3 not have better performance. GPT5 gets about 800k on average per round but I would have expected it to be nearly perfect, since these are not particularly hard questions and mostly trivia or simple look up knowledge questions. There is little reasoning involved so I expected the big models to do much better.