Perception
Sifting through hundreds of thousands of hours of indexed videos
Perception
Sifting through hundreds of thousands of hours of indexed videos
Perception
Arcmira media summary
Arcmira tracks where Perception is discussed across indexed YouTube videos, transcripts, channels, and related entities.
Curious how you look at V3 though. Do you think that we need sort of more advancements or breakthroughs towards the perception side in order to beat V3 or is this still fundamentally a program synthesis benchmark in your eyes? Yeah, I think it's fundamentally a reasoning benchmark. It's not a visual perception benchmark at all. In fact, it was designed this way just like V1 and V2. We were trying to remove the need for perception because perception was sort of like getting in the way of measuring what we cared about, which was uh efficient generalization, effectively efficient skill acquisition in V1, V2, and even V3. The data like the the game state is already in a format that can be processed by a computer. Like it's already effectively in token form. You can put it in an NLM, you can put it in a in a program engine just like we do. There's no need for a vision module. And what we saw on on V1, V2, and I'm sure we are going to uh keep seeing the same pattern this as well is that vision enabled models like VLMs did actually significantly worse than pure sequence like text models. And the reason why is because you can treat like these 2D grids as sequences and you're not really losing any information. It's the same information. Now, if you wanted to have a native understanding of 2D space, you can also rewire a transformer to give it a native understanding of of grids instead of sequences. Of course, if you do that, you would have to pre-train it on grid data, which is not widely available. So, it will not work very well. And in practice, all the state-of-the-art models you're seeing on ARC V1, V2, the pure sequence models usually trained on on code and so on. So yeah, it's definitely not a perception problem at all and perception is not an obstacle to make real progress on these benchmarks like uh related to that uh you know a lot of the solutions dark 2024 really relied on a lot of data augmentation techniques. Do you think that that is going to be an important thing in order to solve V3 as well is to take these and do lots of games that look like V3 in order to beat it. I don't think so. And um in practice, if you look at humans, humans who have played a lot of games, they might be doing a little bit better than other people on these games, but it's not a lot. And even people who don't play games at all can still beat these games, rightMoved to summary
7
Mentions
14.4M
Views

“Curious how you look at V3 though. Do you think that we need sort of more advancements or breakthroughs towards the perception side in order to beat V3 or is this still fundamentally a program synthes...”

“you've kind of got the perception half.”

“This U box is now called perception. Perception is a better name for it because what does it do? This perception process it takes in the uh the data that's happening the actions and the observations a...”

“your perception right now is always your prediction”

“The interface itself was worked on by a company called Perception, which also does interfaces for superhero movies.”
you've kind of got the perception half.
This U box is now called perception. Perception is a better name for it because what does it do? This perception process it takes in the uh the data that's happening the actions and the observations and it forms a a sense of where the agent is now.
your perception right now is always your prediction
The interface itself was worked on by a company called Perception, which also does interfaces for superhero movies.
Arcmira tracks 7 indexed media appearances or mentions for Perception, tied to source videos, channels, and transcript-derived context.
Arcmira uses indexed YouTube videos and transcripts. Representative source evidence on this page includes "Francois Chollet + Mike Knoop | ARC Prize @ MIT" with transcript-derived context and links when available.
Perception is connected to MIT, Twitter, Honda in Arcmira's media graph.