← Library · Frontier

Open Agent Leaderboard Launched for General-Purpose AI Agents

The Open Agent Leaderboard has been launched to provide an open benchmark for comparing full AI agent systems, focusing on both quality and cost. This initiative, paired with the Exgentic framework for evaluations, aims to assess agents across six diverse benchmarks, including code fixing, web research, customer service, and app task completion. The initial findings highlight that general-purpose agents can already rival specialized ones and reveal significant differences in how agents fail, impacting operational costs. The leaderboard encourages community contributions of new agents, benchmarks, and models.

Why it matters

This leaderboard provides crucial transparency and standardization for evaluating AI agents, promoting open research and helping developers understand real-world performance and cost implications.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free