Spawrious: A New Benchmark Suite for Strengthening AI Models against Spurious Correlations

AI Benchmarks: Introducing the Spawrious Dataset

The field of Artificial Intelligence (AI) is evolving rapidly, with new AI models being released frequently, each with its unique features and problem-solving abilities. One significant challenge in AI research is to improve the resistance of AI models to unknown test distributions and decrease their reliance on irrelevant features. For instance, self-driving cars and autonomous kitchen robots have not been widely deployed due to the difficulties they face in scenarios that differ significantly from their training data.

Researchers have conducted multiple studies on spurious correlations (SCs) and proposed methods to mitigate their impact on model performance. It has been found that classifiers trained on popular datasets like ImageNet often rely on background data that is unrelated to the class labels. While progress has been made in addressing the SC problem, the existing benchmarks, such as Waterbirds and CelebA hair color benchmarks, have certain limitations. These benchmarks mostly focus on simple one-to-one (O2O) spurious correlations, whereas many-to-many (M2M) spurious correlations are more common in real-world scenarios.

To address these limitations, a team of researchers from University College London has developed the Spawrious dataset. This dataset is an image classification benchmark suite that includes both O2O and M2M spurious correlations. The dataset is classified into three difficulty levels: Easy, Medium, and Hard. It comprises approximately 152,000 high-quality, photo-realistic images generated using a text-to-image model. To ensure the dataset’s quality and relevance, an image captioning model has been used to filter out unsuitable images.

The evaluation of the Spawrious dataset has shown remarkable performance. Existing state-of-the-art (SOTA) group robustness approaches, such as Hard-splits, faced significant challenges and achieved accuracy of less than 70% when using a ResNet50 model pretrained on ImageNet. The researchers analyzed the incorrect classifications made by the models and found that their reliance on fictitious backgrounds was a contributing factor to their performance shortcomings. The Spawrious dataset successfully exposed the weaknesses of classifiers when dealing with erroneous correlations.

To provide a clearer understanding of the O2O and M2M benchmarks, the researchers gave an example of collecting training data during the summer, where two groups of animal species from different locations were associated with specific background groups. As seasons change and animals migrate, the spurious correlations between animal groups and backgrounds reverse, making it impossible to match them on a one-to-one basis. This highlights the importance of capturing the complex relationships and interdependencies in M2M spurious correlations.

Overall, the Spawrious dataset holds great promise as a benchmark suite for out-of-distribution (OOD), domain generalization algorithms, and for evaluating and improving model robustness in the presence of spurious features.

Check out the paper and GitHub repository for more information. Join our ML SubReddit, Discord Channel, and Email Newsletter for the latest AI research news and projects. If you have any questions or feedback, feel free to email us at

Don’t forget to explore the AI Tools Club for a wide range of AI tools!

(Note: This article was written by Tanya Malhotra, a final-year undergrad specializing in Artificial Intelligence and Machine Learning at the University of Petroleum & Energy Studies, Dehradun.)

Source link

Stay in the Loop

Get the daily email from AI Headliner that makes reading the news actually enjoyable. Join our mailing list to stay in the loop to stay informed, for free.

Latest stories

You might also like...