Fact-checked May 20, 2026
Also called: reinforcement learning with human feedback
RLHF is a way to train AI models using human preferences, helping the AI learn to produce outputs that are more helpful and safe.
Reinforcement Learning from Human Feedback, or RLHF, is a training technique that involves gathering human preferences for an AI model's outputs. These preferences are then used to fine-tune the model, teaching it to generate responses that humans find more desirable, less harmful, or simply better.
Think of it like this: an AI generates a few different answers to a question. Humans then rank those answers from best to worst. This ranking data is fed back into the AI's training process, essentially telling the AI, 'This kind of answer is good, and that kind is not.' Over time, the AI learns to align its outputs more closely with human values and expectations.
RLHF has been a key factor in improving the safety and usability of many large language models, making them much more helpful and less likely to produce undesirable content. It's a crucial step in bridging the gap between what an AI can generate and what humans actually want.
Daily Deck explains terms like RLHF as part of a free seven-card daily brief. No jargon. No fluff.
Start free