Title: Introducing a New Benchmark for Evaluating Multimodal Systems
Artificial intelligence (AI) has made significant strides in recent years, thanks to the use of benchmarks that help researchers set goals and measure progress. These benchmarks have played a vital role in shaping the field of AI, leading to breakthroughs in computer vision and protein folding. As we focus on developing artificial general intelligence (AGI), it is crucial to create robust benchmarks that can evaluate the perceptual capabilities of AI models. To address this need, we are introducing the Perception Test, a multimodal benchmark that uses real-world videos to assess a model’s perception abilities.
Developing the Perception Benchmark:
Existing perception benchmarks in AI research have limitations, as they often focus on specific aspects of perception or exclude audio and temporal aspects. To overcome these limitations, we created a dataset of purposefully designed videos that cover six different perception tasks. The videos depict various real-world activities and are labeled with spatial and temporal annotations by crowd-sourced participants. This dataset allows researchers to evaluate the model’s knowledge of semantics, understanding of physics, temporal reasoning, and abstraction abilities.
Evaluating Multimodal Systems with the Perception Test:
To evaluate multimodal systems using the Perception Test, models are pre-trained on external datasets and given a small fine-tuning set for familiarity with the tasks. The evaluation is conducted using a public validation split and a held-out test split, both of which can only be evaluated through our dedicated server. The results measure the model’s performance across multiple dimensions and tasks, providing a detailed assessment of its skills and areas for improvement.
Promoting Diversity and Further Research:
Diversity is a crucial consideration in developing the Perception Test benchmark. We ensured diverse representation among the crowd-sourced participants involved in filming the videos, selecting individuals from different countries, ethnicities, and genders. The benchmark is publicly available, and we encourage further collaboration with the multimodal research community to introduce new annotations, tasks, metrics, and languages to enhance the benchmark’s scope.
Learn More and Join the Workshop:
To learn more about the Perception Test, including accessing the benchmark and reading the research paper, visit our website. We will soon launch a leaderboard and challenge server for the benchmark. Additionally, we are hosting a workshop on general perception models at the European Conference on Computer Vision (ECCV 2022), where experts in the field can discuss our approach and collaborate on designing and evaluating general perception models. If you’re interested in contributing, please reach out to us at firstname.lastname@example.org.
The Perception Test benchmark offers a valuable tool for evaluating multimodal AI systems’ perceptual capabilities. By providing a comprehensive assessment of these systems, we can drive further research and improvement in the field of artificial intelligence. We look forward to the contributions and collaborations from the research community to enhance the benchmark and advance the development of general perception models.