R
Rlvr
Indexing
Sifting through hundreds of thousands of hours of indexed videos
Rlvr
4
Mentions
17.5K
Views
Timeline data is premium

“Reinforcement Learning from Verifiable Rewards, the central theme of the lecture.”
Analyze
“Reinforcement Learning with Verifiable Rewards, a key post-training technique.”
Analyze![[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI](https://img.youtube.com/vi/botHQ7u6-Jk/mqdefault.jpg)
“Reinforcement Learning from Verifiable Rewards, a post-training method discussed as a successor to DPO.”
Analyze
“Reinforcement Learning from Verifiable Rewards, a method for training models on tasks with objective ground truths like math and code.”
Analyze