RM
Reward Model
Indexing
Sifting through hundreds of thousands of hours of indexed videos
Reward Model
Sifting through hundreds of thousands of hours of indexed videos
Reward Model
4
Mentions
1.3M
Views

“A model trained by RLHF to give high rewards to preferred completions.”
Analyze
“Another model used to judge outputs and provide RL signals in RLMF.”
Analyze
“Part of the pipeline: 'train a reward model and you do some fancy RL or other optimization'.”
Analyze
“training is what's called a reward model”
Analyze