← Library · Frontier

On-Policy Expert Corrections (OEC) for Robust Multi-Turn LM Agents

Scale Labs researchers have introduced On-Policy Expert Corrections (OEC), a lightweight adaptation of DAgger, to enhance the robustness and training efficiency of multi-turn large language model (LLM) agents. OEC addresses covariate shift, a common problem in long-horizon LLM agents, by seamlessly combining behavioral cloning (imitation learning) and reinforcement learning. The technique involves starting a trajectory with the student agent, then switching to an expert model partway through to complete the trajectory, ensuring in-distribution data for learning.

Why it matters

OEC provides a practical solution to a major challenge in LLM agent training by mitigating covariate shift, leading to more robust and reliable agents. This method allows agents to learn from expert demonstrations in their own context, accelerating the development of capable AI agents for complex tasks like software engineering.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free

On-Policy Expert Corrections (OEC) for Robust Multi-Turn LM Agents

Learn one new AI thing every day.

Related frontiers