#5

All Things RLHF: How We Actually Teach AI to Care What Humans Think

"ChatGPT feels different." I heard that phrase about a hundred times in late 2022. Everyone was groping for why it felt qualitatively better than GPT-3, which by parameter count and pre-training data was the same model underneath. The answer wasn't a bigger model or more tokens. It was a training technique most engineers outside the alignment world had never touched: Reinforcement Learning from Human Feedback. RLHF is the bridge between "an LLM that predicts the next token" a

Tushar Prasad

Apr 1920 min read