← Glossary · Papers and Research

GPQA

Benchmark

Fact-checked May 20, 2026

Also called: General Problem Answering with High Accuracy

GPQA is a challenging benchmark designed to evaluate advanced AI models on complex, expert-level questions requiring deep understanding and reasoning.

GPQA stands for 'General Problem Answering with High Accuracy.' It's a tough test for AI models, especially large language models (LLMs), created to see how well they can answer really hard questions that often require expert knowledge and careful thinking. Think of it like a graduate-level exam for an AI. The questions in GPQA are curated by human experts, often requiring extensive background knowledge and multi-step reasoning to get right. It's often used to measure progress in AI capabilities beyond simple fact recall.

The benchmark features a large set of multiple-choice questions across various scientific and technical fields, like physics, chemistry, and biology. What makes GPQA unique is its focus on questions that even highly educated humans find difficult. It aims to push AI models beyond surface-level understanding and towards more human-like reasoning abilities, highlighting areas where current models still struggle. Achieving high scores on GPQA is seen as a significant step towards developing more truly intelligent AI systems. It was introduced in a paper titled 'GPQA: A Benchmark for Foundational AI,' published in 2023.

Learn AI in 5 minutes a day.

Daily Deck explains terms like GPQA as part of a free seven-card daily brief. No jargon. No fluff.

Start free