Ask HN: How are you evaluating your LLMs in production?
2ReDeiPirati17/1/2025, 6:09:59 PM
Hello HN! Which tools do you use to evaluate your LLMs and agents in production?
Comments (1)
znpy · 7h ago
Sysadmin here ("cloud engineer" is what's in my contract).
> Which tools do you use to evaluate your LLMs and agents in production?
None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.
It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).
So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.
> Which tools do you use to evaluate your LLMs and agents in production?
None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.
It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).
So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.