Show HN: Detecting hallucinations in LLM function calling with entropy
3 honorable_coder 0 8/17/2025, 5:12:25 PM archgw.com ↗
We use this for tool calling in https://github.com/katanemo/archgw, that uses a 3b function-calling LLM to map a user's ask to one of many tools for routine agentic operations in an application.
Why we do this: latency. A 3b parameter model, especially when quantized, can deliver sub-100ms time-to-first-token and generate a complete function call in under 100ms. That makes the LLM “disappear” as a bottleneck, so the only real waiting time is in the external tool or API being called + the time it takes to synthesize a human readable response.
No comments yet