← Library · Frontier

Scale Labs Proposes On-Policy Expert Corrections (OEC) for Robust Multi-Turn LM Agents

Scale Labs' recent work introduces On-Policy Expert Corrections (OEC), a lightweight adaptation of DAgger, to improve the training efficiency and robustness of multi-turn LM agents. OEC addresses covariate shift, a problem where imitation learning struggles when a student model deviates from expert trajectories. The method starts a trajectory with the student model, then switches to an expert model to complete the trajectory, using only the expert portion for supervised fine-tuning (SFT).

Why it matters

OEC provides a practical solution to a fundamental challenge in training robust multi-turn AI agents, combining the benefits of imitation learning and reinforcement learning. This is particularly relevant for agents operating in complex environments like software engineering.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free