Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents
3 nobody9999 1 5/26/2025, 7:13:03 AM arxiv.org ↗
Comments (1)
nobody9999 · 2d ago
>...In this paper, we present Vending-Bench, a simulated environment designed to specifically test an LLM-based agent's ability to manage a straightforward, long-running business scenario: operating a vending machine. Agents must balance inventories, place orders, set prices, and handle daily fees - tasks that are each simple but collectively, over long horizons (>20M tokens per run) stress an LLM's capacity for sustained, coherent decision-making.