What are the biggest issues that the agent faces at the moment? I still find these general purpose agents frustrating to use at times because people position it as if it could do anything and then when you give it a reasonably complex task it breaks down.
I guess if someone figured out way to minimize the impact of an error, like a way for it to gracefully handle it without it feeling like too much work, that would fix most of the problems.
atupem · 1d ago
Lots of interesting issues:
- The agent has a tool to set it's task to 'completed', 'failed', or 'needs_help', with the last one being a option for human in the loop scenarios. Sometimes the agent gets lazy and says it needs help prematurely.
- Additionally, the agent can create subtasks for itself, either to run immediately, or to schedule in the future. Here it again can call that tool a bit too eagerly, filling duplicate subtasks for a task that involves repetitive work.
- Properly handling super long running tasks, that run for 1+ hours. The context window eventually hits it's limit (this will be addressed this week)
Aside from those top of mind issues, there's a whole bunch of scaffolding issues - filesystem permissions, prompt injection security, i/o support, token cost - lot's to improve!
We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor
I guess if someone figured out way to minimize the impact of an error, like a way for it to gracefully handle it without it feeling like too much work, that would fix most of the problems.
- The agent has a tool to set it's task to 'completed', 'failed', or 'needs_help', with the last one being a option for human in the loop scenarios. Sometimes the agent gets lazy and says it needs help prematurely.
- Additionally, the agent can create subtasks for itself, either to run immediately, or to schedule in the future. Here it again can call that tool a bit too eagerly, filling duplicate subtasks for a task that involves repetitive work.
- Properly handling super long running tasks, that run for 1+ hours. The context window eventually hits it's limit (this will be addressed this week)
Aside from those top of mind issues, there's a whole bunch of scaffolding issues - filesystem permissions, prompt injection security, i/o support, token cost - lot's to improve!
We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor