Arcmira media summary

Reward hacking, explained: podcasts, interviews & video clips

Explore podcasts, interviews & explainers on Reward hacking — 18 indexed, updated May 2026.

Representative appearances

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein
The phenomenon where AI models find unintended ways to maximize reward signals.
The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking
When a model finds unintended shortcuts to achieve a high reward score.
AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF
The phenomenon where AI systems find unintended ways to maximize reward signals.
Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving
Discussion on how models exploit reward signals, a core problem in ML.
Transformers & LLMs | AI Preference Tuning (Part 5)
A failure mode where AI finds loopholes to get high scores without being helpful.

Organizations

Products

Channels

What does Arcmira know about Reward hacking?

Arcmira tracks 18 indexed media appearances or mentions for Reward hacking, tied to source videos, channels, and transcript-derived context.

Where does Arcmira's data about Reward hacking come from?

Arcmira uses indexed YouTube videos and transcripts. Representative source evidence on this page includes "The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein" with transcript-derived context and links when available.

What is Reward hacking connected to?

Reward hacking is connected to Anthropic, OpenAI, DeepSeek in Arcmira's media graph.

Topic

Reward hacking

Mentions

2.5M

Views

Timeline signal

LOCKED

Timeline data is premium

The trendline is visible, but the dated evidence behind Reward hacking is in the premium layer.

mentions hidden

Reward hacking mentions on podcasts & videos

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

@ 89:44

Machine Learning Street TalkBrief5/4/2026

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

“The phenomenon where AI models find unintended ways to maximize reward signals.”

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

1st @ 00:38

Cognitive Revolution "How AI Changes Everything"Brief5/1/2026

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

“When a model finds unintended shortcuts to achieve a high reward score.”

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

@ 38:33

Cognitive Revolution "How AI Changes Everything"Brief3/16/2026

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

“The phenomenon where AI systems find unintended ways to maximize reward signals.”

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

1st @ 02:39

Cognitive Revolution "How AI Changes Everything"Brief3/1/2026

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

“Discussion on how models exploit reward signals, a core problem in ML.”

Transformers & LLMs | AI Preference Tuning (Part 5)

@ 04:12

MuhibuddinBrief12/29/2025

Transformers & LLMs | AI Preference Tuning (Part 5)

“A failure mode where AI finds loopholes to get high scores without being helpful.”

Reward Hacking

Indexing

Sifting through hundreds of thousands of hours of indexed videos

Reward Hacking

Reward hacking mentions on podcasts & videos

@ 89:44

Machine Learning Street TalkBrief5/4/2026

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

“The phenomenon where AI models find unintended ways to maximize reward signals.”

1st @ 00:38

Cognitive Revolution "How AI Changes Everything"Brief5/1/2026

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

“When a model finds unintended shortcuts to achieve a high reward score.”

@ 38:33

Cognitive Revolution "How AI Changes Everything"Brief3/16/2026

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

“The phenomenon where AI systems find unintended ways to maximize reward signals.”

1st @ 02:39

Cognitive Revolution "How AI Changes Everything"Brief3/1/2026

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

“Discussion on how models exploit reward signals, a core problem in ML.”

@ 04:12

MuhibuddinBrief12/29/2025

Transformers & LLMs | AI Preference Tuning (Part 5)

“A failure mode where AI finds loopholes to get high scores without being helpful.”

Reward hacking, explained: podcasts, interviews & video clips

Representative appearances

Organizations

Products

Channels

What does Arcmira know about Reward hacking?

Where does Arcmira's data about Reward hacking come from?

What is Reward hacking connected to?

Reward hacking

Timeline data is premium

Reward hacking mentions on podcasts & videos

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Transformers & LLMs | AI Preference Tuning (Part 5)

Reward Hacking

Reward hacking, explained: podcasts, interviews & video clips

Representative appearances

Organizations

Products

Channels

What does Arcmira know about Reward hacking?

Where does Arcmira's data about Reward hacking come from?

What is Reward hacking connected to?

Reward hacking

Timeline data is premium

Reward hacking mentions on podcasts & videos

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Transformers & LLMs | AI Preference Tuning (Part 5)

Related topics