Why is it hard to evaluate GenAI applications?

4 gytrcrt 2 6/6/2025, 6:15:11 PM andreagao.com ↗

Comments (2)

PaulHoule · 21h ago
What I noticed in the 2010s was that there was very little enthusiasm to do evaluation for information retrieval or classical ML even though it was often straightforward to do.

Interest in eval has skyrocketed just like vector databases have in the LLM age. Finally people see enough value in an ML system to be worth doing eval work, but... it's much harder!

gytrcrt · 20h ago
I think the difference is: 1. there was no hallucination from information retrieval or classic ML back 2010s 2. there was way lower engagement from general public or even regulator on classic ML system. aka, people were not able to directly "talk" to a ML system like ChatGPT

the 2 points combined drive way more scrutiny on GenAI models/apps.