Back to Glossary
RLHFRLHF
人間のフィードバックによる強化学習(アールエルエイチエフ)
AdvancedModels & Architecture
Reinforcement Learning from Human Feedback — a training technique where human preferences guide the model to produce better, safer outputs.
Why It Matters
RLHF is how ChatGPT and Claude became helpful and safe — it aligns AI behavior with human values.
Example in Practice
Human raters comparing two AI responses and picking the better one, training the model to prefer helpful answers.