Artificial intelligence (AI) agents have made significant advancements in complex game environments. For example, AlphaZero defeated world-champion programs in games like chess, shogi, and Go by learning through a process of trial and error. However, these AI agents were limited to training on specific games and tasks and couldn’t transfer their knowledge to new games without starting from scratch.
To overcome this limitation, DeepMind has developed a new approach called Open-Ended Learning. In a preprint titled “Open-Ended Learning Leads to Generally Capable Agents,” DeepMind details its efforts to train an AI agent capable of playing multiple games without the need for human interaction data.
To create this versatile agent, DeepMind designed a game environment called XLand, which includes various multiplayer games in realistic 3D worlds. XLand allows for the development of new learning algorithms that dynamically control the agent’s training and the games it plays. The agent’s capabilities improve iteratively as it faces challenges during training, continuously refining its skills so that it never stops learning.
The result is an AI agent that can succeed at a wide range of tasks, from simple object-finding problems to complex games like hide and seek and capture the flag. This agent exhibits general, heuristic behaviors that can be applied to many different tasks, rather than being specialized to a single task. This breakthrough brings us closer to creating AI agents that can quickly adapt to ever-changing environments.
XLand overcomes the limitation of training data by allowing for the generation of tasks programmatically. The game space in XLand is programmatically specified, which enables the generation of automated and algorithmic data. Additionally, the multiplayer aspect of XLand introduces complex interactions that further enhance the training data.
Deep RL plays a crucial role in training the neural networks of the AI agents. The agent’s neural network architecture includes an attention mechanism that guides its focus using game-specific subgoals. This goal-attentive agent (GOAT) proves to be more capable and adaptable.
To optimize the training process, DeepMind explores the distribution of training tasks that will produce the best agent. The dynamic task generation allows for continual adjustments to the distribution, ensuring that the tasks are challenging enough for training. Population-based training (PBT) further fine-tunes the task generation parameters based on the agents’ performance.
The training process involves deep RL continually updating the agents’ neural networks based on their experiences. The process is highly iterative, gradually increasing complexity and adapting to the agents’ learning progress. This open-ended learning process has no performance limits and depends on the expressivity of the environment and agent neural network.
To evaluate the performance of the AI agents, DeepMind creates a set of separate evaluation tasks using held-out games and worlds. The scores of the agents are normalized per task, taking into account the entire distribution of scores and the percentage of tasks in which the agent achieves rewards.
DeepMind’s Open-Ended Learning approach is a significant step towards creating more versatile AI agents. These agents can adapt to new games and tasks without the need for retraining from scratch. With the ability to learn and generalize from a wide range of environments, AI agents are becoming more capable and flexible, bringing us closer to realizing the full potential of artificial intelligence.