Google’s New Perception Test Benchmark
Google’s team of researchers, which includes Viorica Pătrăucean, Lucas Smaira, and others, has introduced a new benchmark to evaluate multimodal artificial intelligence (AI) systems. The benchmark is called the Perception Test. This benchmark is designed to test the perception capabilities of AI models using real-world videos.
The Role of Benchmarks in AI Research
Benchmarks have played a significant role in shaping AI research by helping to define research goals and measure progress. Past breakthroughs in AI, such as in computer vision and protein folding, have been closely linked to the use of benchmark datasets.
The Significance of Perception in AI
Perception, which involves experiencing the world through senses, is a significant part of intelligence. Developing AI agents with human-level perceptual understanding has become increasingly important in various fields, such as robotics, self-driving cars, and medical imaging.
Creating the Perception Test
The team created a dataset of real-world videos, each labeled according to six different types of perception tasks. These tasks include object tracking, point tracking, temporal action and sound localization, multiple-choice video question-answering, and grounded video question-answering.
The Benchmark Evaluation
The evaluation setup involves inputs of video and audio sequences, as well as a task specification. The results are detailed across several dimensions to measure the abilities of the AI model.
The Perception Test benchmark is now publicly available, and further details can be found in the research paper. A workshop on general perception models will also be hosted at the European Conference on Computer Vision in Tel Aviv. The team hopes to collaborate with the AI research community to introduce additional annotations, tasks, metrics, and even new languages to the benchmark.
For those interested in contributing, you can email firstname.lastname@example.org for more information.