← Library · Frontier

Scale Labs Introduces On-Policy Expert Corrections (OEC) for Robust Multi-Turn LM Agents

Scale Labs unveiled On-Policy Expert Corrections (OEC), a method to train more robust multi-turn language model agents by addressing covariate shift. OEC, a lightweight adaptation of DAgger, involves rolling out a student model, then switching to an expert model to complete the trajectory, with the expert inheriting the student's history. This approach generates in-distribution data and learns from expert actions, combining the benefits of imitation learning and reinforcement learning, and significantly improves performance over pure behavioral cloning for software engineering agents.

Why it matters

OEC provides a practical, efficient, and robust solution for training multi-turn LLM agents, especially in complex, long-horizon tasks like software engineering, by mitigating issues like covariate shift.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free