Tune self-correct SQL agent with RL: AgentLightning+verl+vLLM+AgentOps+LangGraph

Comments (1)

ultmaster · 5h ago

I trained the agent itself to write SQL, run it, check results, then rewrite until correct. The write and rewrite policies are optimized with RL, using a client–server setup in Agent Lightning and a LangGraph state machine. On a 500-sample Spider eval subset, Qwen2.5-Coder-3B with 4096 context reaches 80.4% at three turns of write and rewrite, 80.2% at one turn. After training, model Qwen2.5-Coder-1.5B can be even better than Qwen2.5-Coder-3B (untrained). I have compared multiple models and settings, hoping to shed light on tuning AI agents.

Full article: https://medium.com/@yugez/training-ai-agents-to-write-and-se...

Related projects:

- Agent Lightning as the glue: https://github.com/microsoft/agent-lightning

- verl for RL algorithms: https://github.com/volcengine/verl

- vLLM for efficient rollouts: https://github.com/vllm-project/vllm

- AgentOps for collecting training data (telemetry): https://github.com/AgentOps-AI/agentops

- LangGraph for agent orchestration: https://www.langchain.com/langgraph

New GNU/Linux post-install tool

Ex-Google Exec Says "The Idea That AI Will Create New Jobs Is 100% Crap" (windowscentral.com)

Scorching Cells (reuters.com)

Show HN: Ugly Wallpapers (wallpapers.branding5.com)

How Well Do LLMs Perform on a Raspberry Pi 5? (stratosphereips.org)

Show HN: Open-source protocol for secure tool-calling [Technical Specification] (utcp.io)

An efficient and lightweight local debugging tool (github.com)

Operation Costs in CPU Clock Cycles (2016) (ithare.com)

Justice Dept. Settles with Greystar to End Participation in Algorithmic Pricing (justice.gov)

EU's new AI code of practice could set regulatory standard for US companies (rhodeislandcurrent.com)

Show HN: Free and open source web site and server monitoring tool (github.com)

Zetrix Introduces DeepSeek-Based NurAI Shariah-Compliant AI Chatbot in Malaysia (technave.com)

Intel CPU Temperature Monitoring Driver for Linux Now Unmaintained After Layoffs (phoronix.com)

Grok Reffers to Black Person as Chimpanzee (twitter.com)

Amazon Drone Beehive Concept (2019) (etrr.springeropen.com)

Wind and solar power helps keep America's farms alive (theconversation.com)

Wikimedia Foundation Challenges UK Online Safety Act Regulations (wikimediafoundation.org)

Open Databases Integration for Materials Design (optimade.org)

The Mother of All Demos (en.wikipedia.org)

Anyone here using BICC extracts or BI Publisher?

We Fixed AI's Broken Promise (understoryai.substack.com)

The iHost – Thoughts about the future of Self-Hosting (kiranet.org)

Stewart Brand (en.wikipedia.org)

Pricing Pages – A Curated Gallery of Pricing Page Designs (pricingpages.design)

Agentic Coding a FastHTML RAG Eval App (elite-ai-assisted-coding.dev)

Show HN: AskPrisma – Multi-agent AI that can replace a junior data analyst (askprisma.ai)

Show HN: Airbook – Cursor for Analytics

The Reason Your Company's Growth Is Stalled (businessofsoftware.org)

Las Vegas sees drop in tourism, hinting at broader economic woes (npr.org)

Measuring context switching and memory overheads for Linux threads (2018) (eli.thegreenplace.net)

Cybertruck deactivated on road after a cease and desist for using it in a song (old.reddit.com)

Lucy Could Visit an Additional Sub-Km Asteroid with a Course Correction (universetoday.com)

Ask HN: Should a no-code AI app builder be open source? (magicnode.ai)

Fix your port numbers for dev servers (rollc.at)

Task Files (quexxon.net)

A Berkeley Odyssey: Ten years of BSD history (1985) (gitpi.us)

Study on Mice Suggests Nose-Picking Has a Surprising Link with Alzheimer's (sciencealert.com)

Nukes, Nubs and Coners: The Unique Social Hierarchy Aboard a Nuclear Submarine (twz.com)

Avi Loeb Figured Out Why Spacecraft Have Comae (sites.psu.edu)

Israel's Leviathan signs $35B natural gas supply deal with Egypt (reuters.com)

500k H1B Workers Approved Every Year (twitter.com)

The History of Acer (abortretry.fail)

Forget Online Courses. Learn from Your Neighbors (mavenly.org)

Show HN: 1 Million Rows (1mrows.pages.dev)

US constitution Article S9.C5 on export taxes (law.cornell.edu)

Documentation that is never wrong (kodare.net)

Experiments in 3D Printing Electric Motors (mdpi.com)

Reflections on the React Community (leerob.substack.com)

Why Is Web Performance Undervalued? (blaines-blog.com)

Grug: The Perfect Modding Language (mynameistrez.github.io)

Tune self-correct SQL agent with RL: AgentLightning+verl+vLLM+AgentOps+LangGraph

Comments (1)