AI agents fail tasks 70% of the time
22 JTbane 7 8/12/2025, 3:01:07 PM arxiv.org ↗
Comments (7)
drannex · 3h ago
Yes! but, when they work, they only kinda work, sort of.
rogerkirkness · 1d ago
Agents went from 10% to 30% reliable this year, which is still a big deal.
bogzz · 10h ago
lol
thebigspacefuck · 23h ago
This is from a Dec 2024 which feels like a while ago
bsallthewaydown · 14h ago
AI is a going to be the next bubble. It can't even figure out who the real author of a sculpture is. It's really all BS made up to play with markets and geopolitics. Enjoy it while it lasts.
JTbane · 1d ago
"We test baseline agents powered by both closed API-based and open-weights language models (LMs), and find that the most competitive agent can complete 30% of tasks autonomously."
gavinray · 1d ago
So you ask it to try every task 3.33 times for guaranteed success?