Zubnet AILearnWiki › RLHF
Training

RLHF

Also known as: Reinforcement Learning from Human Feedback
A training technique where human evaluators rank model outputs by quality, and this feedback is used to train a reward model that guides the AI toward better responses. It's what turns a raw pre-trained model (which just predicts next words) into a helpful, harmless assistant.

Why it matters

RLHF is the secret ingredient that made ChatGPT feel different from GPT-3. The base model already "knew" everything, but RLHF taught it to present that knowledge in a way humans actually find useful. It's also how safety behaviors are reinforced.

Related Concepts

← All Terms
← Resemble AI Runway →
ESC