Research towards AI models that can generalize, scale, and accelerate science
Next week, the 11th International Conference on Learning Representations (ICLR) will kick off in Kigali, Rwanda. This is a significant event as it is the first major AI conference in Africa and the first in-person event since the pandemic began.
Researchers from around the world will gather to share their groundbreaking work in deep learning across various fields, such as AI, statistics, data science, machine vision, gaming, and robotics. DeepMind is proud to support the conference as a Diamond sponsor and DEI champion.
DeepMind teams will present 23 papers at the conference, and here are a few noteworthy highlights:
Open questions on the path to AGI
Recent progress in AI has demonstrated impressive performance in text and image tasks. However, there is still a need for further research to enable systems to generalize across different domains and scales. This is a crucial step towards achieving artificial general intelligence (AGI) and its transformative potential in our daily lives.
We have developed a new approach that allows models to learn by simultaneously solving two problems. By training models to examine a problem from two different perspectives, they acquire the ability to reason and solve tasks that involve similar problems. This enhances their ability to generalize. We have also explored the generalization capability of neural networks by comparing them to the Chomsky hierarchy of languages. Through rigorous testing of 2200 models across 16 different tasks, we have discovered that certain models struggle to generalize. However, we found that improving their performance is possible by augmenting them with external memory.
Another challenge we address is how to make progress on longer-term tasks that offer few rewards. We have developed a new approach and training dataset that helps models learn to explore in ways that resemble human behavior over extended time periods.
As AI capabilities continue to advance, it is essential to ensure that current methods work effectively in real-world scenarios. For instance, while language models can generate impressive answers, many cannot explain their responses. We have introduced a method that utilizes language models to solve multi-step reasoning problems by leveraging their underlying logical structure. This approach provides explanations that humans can understand and verify.
Adversarial attacks, which involve pushing AI models to produce incorrect or harmful outputs, are a way to test the limits of these models. Training models on adversarial examples enhances their resilience to attacks but may impact their performance on “regular” inputs. We have shown that by adding adapters, we can create models that allow us to control the tradeoff between robustness and performance.
Reinforcement learning (RL) has proven successful in tackling real-world challenges. However, RL algorithms are typically designed to excel at a specific task and struggle to generalize to new ones. We propose a method called algorithm distillation that enables a single model to efficiently generalize to new tasks by imitating the learning histories of RL algorithms across diverse tasks. Additionally, RL models learn through trial and error, which can be time-consuming and data-intensive. We have discovered a new approach that requires 200 times less experience to train a model to reach human-level performance across multiple Atari games, significantly reducing computational and energy costs.
AI for science
AI is a powerful tool for researchers to analyze complex data and gain a deeper understanding of the world. Several papers presented at the conference highlight how AI is accelerating scientific progress while science itself is advancing AI.
Predicting properties of molecules based on their 3D structure is crucial for drug discovery. We have developed a denoising method that achieves a new state-of-the-art in molecular property prediction. This method enables large-scale pre-training and generalization across different biological datasets. Additionally, we have introduced a new transformer that can perform more accurate quantum chemistry calculations using atomic positions alone.
Lastly, we have created FIGnet, which is inspired by physics and models collisions between complex shapes like teapots or doughnuts. This simulation technology has applications in robotics, graphics, and mechanical design.
For a full list of DeepMind papers and the schedule of events at ICLR 2023, you can visit our website.