← Glossary · Behavior

MMLU

Benchmark

Fact-checked May 29, 2026

Also called: Massive Multitask Language Understanding

MMLU is a widely used benchmark that tests AI models on their understanding and reasoning abilities across 57 varied subjects, from history to computer science.

MMLU stands for Massive Multitask Language Understanding. It's a challenging test designed to evaluate how well large language models (LLMs) can understand and reason across a broad spectrum of human knowledge. Think of it like a very comprehensive, multi-subject exam for AI, covering topics you'd find in a high school or college curriculum.

The benchmark was created by a team of researchers because they noticed that many existing AI tests were becoming too easy for advanced models. These older tests often focused on simple fact recall or basic language tasks. MMLU, on the other hand, includes questions that require deeper understanding, critical thinking, and the ability to apply knowledge in different contexts.

MMLU consists of 57 different subjects, ranging from abstract algebra and physics to US history, ethics, and philosophy. Each subject has multiple-choice questions, and the models are scored on their accuracy. This wide variety of subjects makes it difficult for an AI model to "specialize" in just a few areas. To perform well on MMLU, a model needs to have a broad and deep grasp of many different fields of study.

When you see a new AI model being announced, you'll often see its MMLU score proudly displayed. It's one of the key metrics developers and researchers use to compare the general intelligence and breadth of knowledge of different LLMs. A higher MMLU score generally indicates a more capable and well-rounded AI model, suggesting it can handle a wider array of complex tasks and topics.

Learn AI in 5 minutes a day.

Daily Deck explains terms like MMLU as part of a free seven-card daily brief. No jargon. No fluff.

Start free