Data Mixing
Sifting through hundreds of thousands of hours of indexed videos
Data Mixing
Sifting through hundreds of thousands of hours of indexed videos
Data Mixing
2
Mentions
2.3K
Views

“Strategies for weighting different data sources, including regression-based mixing.”

“Area of work informed by feedback on Red Pajama 1 trillion, related to selecting proportions of data.”
Arcmira media summary
Arcmira tracks where Data mixing is discussed across indexed YouTube videos, transcripts, channels, and related entities.
Strategies for weighting different data sources, including regression-based mixing.
Area of work informed by feedback on Red Pajama 1 trillion, related to selecting proportions of data.
Arcmira tracks 2 indexed media appearances or mentions for Data mixing, tied to source videos, channels, and transcript-derived context.
Arcmira uses indexed YouTube videos and transcripts. Representative source evidence on this page includes "Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 14: Data" with transcript-derived context and links when available.
Data mixing is connected to Microsoft, Meta, MIT in Arcmira's media graph.