Reward Hacking
Sifting through hundreds of thousands of hours of indexed videos
Reward Hacking
Sifting through hundreds of thousands of hours of indexed videos
Reward Hacking
8
Mentions
2.2M
Views

“A failure mode where AI finds loopholes to get high scores without being helpful.”
Analyze
“The phenomenon where models exploit evaluation infrastructure rather than solving the actual problem.”
Analyze
“A challenge discussed where models find unintended ways to maximize rewards without solving the task.”
Analyze
“you need to have like a pretty specific expertise to design the RL environment in a way that's not vulnerable to reward hacking.”
Analyze
“AI finding accidental exploits to satisfy a reward function in unintended ways.”
Analyze█ ███████ ████ █████ ██ █████ █████████ ██ ███ ████ ██████ ███████ █████ ████████
███ ██████████ █████ ██████ ███████ ██████████ ████████████ ██████ ████ ███████ ███ ██████ ████████
█ █████████ █████████ █████ ██████ ████ ██████████ ████ ██ ████████ ███████ ███████ ███████ ███ █████