← Library · Frontier

Scale Labs Proposes On-Policy Expert Corrections (OEC) for Robust Multi-Turn LM Agents

Scale Labs introduced On-Policy Expert Corrections (OEC), a lightweight adaptation of DAgger, to address covariate shift in multi-turn large language model (LLM) agents, particularly in software engineering tasks. OEC trains student models using expert demonstrations conditioned on the student's on-policy history, ensuring in-distribution data while leveraging verifiable rewards. This method significantly improves training efficiency and agent robustness, outperforming pure behavioral cloning and purely on-policy training methods.

Why it matters

OEC provides a practical solution to a fundamental limitation in training multi-turn LLM agents, making them more reliable and capable for long-horizon tasks, especially in software development.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free