Why is it hard to evaluate GenAI applications?

Comments (2)

PaulHoule · 21h ago

What I noticed in the 2010s was that there was very little enthusiasm to do evaluation for information retrieval or classical ML even though it was often straightforward to do.

Interest in eval has skyrocketed just like vector databases have in the LLM age. Finally people see enough value in an ML system to be worth doing eval work, but... it's much harder!

gytrcrt · 20h ago

I think the difference is: 1. there was no hallucination from information retrieval or classic ML back 2010s 2. there was way lower engagement from general public or even regulator on classic ML system. aka, people were not able to directly "talk" to a ML system like ChatGPT

the 2 points combined drive way more scrutiny on GenAI models/apps.

Gander (YC F24) Is Hiring Founding Engineers and Interns (ycombinator.com)

Ziina (YC W21) the Series A fintech is hiring product engineers (ziina.notion.site)

Onyx (YC W24) – AI Assistants for Work Hiring Founding AE (ycombinator.com)

Great Question (YC W21) Is Hiring a Director of Customer Success (ycombinator.com)

Deepnote (YC S19) is hiring engineers to build an AI-powered data notebook (deepnote.com)

Converge (YC S23) Well-capitalized New York startup seeks product developers (runconverge.com)

CircuitHub (YC W12) is hiring full-stack robotics engineers (workatastartup.com)

AtoB (YC S20) – Stripe for Transportation – is hiring engineers (jobs.ashbyhq.com)

PromptArmor (YC W24) Is Hiring in San Francisco (ycombinator.com)

Depot (YC W23) is hiring an enterprise support engineer (UK/EU) (ycombinator.com)

Patched (YC S24) Is Hiring SWEs in Singapore (ycombinator.com)

Activeloop (YC S18) Is Hiring Senior Back End and AI Search Engineers(Onsite, MV) (careers.activeloop.ai)

Morph (YC S23) Is Hiring a ML Engineer

Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Francisco (ycombinator.com)

Demodesk (YC W19) Is Hiring Rails Engineers (demodesk.com)

Piramidal (YC W24) Is Hiring a Senior Full Stack Engineer (ycombinator.com)

AccessOwl (YC S22) is hiring an AI TypeScript Engineer to connect 100s of SaaS (ycombinator.com)

StackAI (YC W23) Is Looking for SWR and Tailwind Wizards (ycombinator.com)

Weave (YC W25) is hiring a founding engineer (ycombinator.com)

Infisical (YC W23) Is Hiring Full Stack Engineers (TypeScript) in US and Canada (ycombinator.com)

GoGoGrandparent (YC S16) is hiring Back end Engineers

Roundtable (YC S23) Is Hiring a Member of Technical Staff (ycombinator.com)

Diligent (YC S23) Is Hiring a Founding AI Engineer (ycombinator.com)

Venta AI (YC S23) is hiring a full stack engineer in Amsterdam (ycombinator.com)

Martin (YC S23) Is Hiring Founding AI/Product Engineers to Build a Better Siri (ycombinator.com)

Trellis (YC W24) Is Hiring founding SDR to help automate healthcare paperwork (ycombinator.com)

Sorcerer (YC S24) Is Hiring a Lead Hardware Design Engineer (jobs.ashbyhq.com)

Harper (YC W25) Is Hiring Applied AI / AI Context Engineers and Data Scientist (ycombinator.com)

Overlap (YC S24) Is Hiring (ycombinator.com)

Ashby (YC W19) Is Hiring Engineering Managers (ashbyhq.com)

Glasskube (YC S24) is hiring in Vienna to build Open Source deployment tools (ycombinator.com)

Why is it hard to evaluate GenAI applications?

Comments (2)