← Library · Frontier

On-Policy Expert Corrections (OEC) for Robust Multi-Turn LM Agents

Scale Labs has proposed On-Policy Expert Corrections (OEC), a lightweight adaptation of DAgger, to improve the training efficiency and robustness of multi-turn large language model (LLM) agents, particularly in software engineering tasks. OEC addresses covariate shift, a common problem where training data differs from real-world interaction data. The method involves starting a trajectory with a student model, then switching to an expert model to complete it, and then using the expert portion for supervised fine-tuning.

Why it matters

OEC offers a practical solution to train more robust and efficient LLM agents by combining the benefits of imitation learning and reinforcement learning, especially for long-horizon tasks.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free