In my fourth post in the series, I tackle how to make multi-step agent workflows learn behavior from data. Most agents today rely on vibes: prompt tuning, hand-written templates, and hope(!). This post is about replacing that with metrics and optimization.
Each branch in the workflow learns how to behave, not just where to route. I show how to set up a reward, plug in an optimizer, and treat agent behavior as something you can tune like a model.
Each branch in the workflow learns how to behave, not just where to route. I show how to set up a reward, plug in an optimizer, and treat agent behavior as something you can tune like a model.