Gato: A Multi-Modal, Multi-Task Generalist Agent for Text, Images, and Robotics

AI News

Gato: A Multi-Modal, Multi-Task Generalist Agent for Text, Images, and Robotics

Jimmy W.

July 4, 2023

Gato: A Multi-Modal, Multi-Task Generalist Agent for Text, Images, and Robotics

Introducing Gato: A Versatile AI Agent

Gato is an AI agent that goes beyond just text outputs. It is a multi-modal, multi-task, multi-embodiment generalist policy agent. With the same network and weights, Gato can perform a range of tasks, including playing Atari games, captioning images, engaging in chat, controlling a robot arm, and more. It decides which output to produce based on its context, whether it be text, joint torques, button presses, or other tokens.

Training Phase of Gato

During the training phase, data from different tasks and modalities are serialized into a flat sequence of tokens. These tokens are then batched and processed by a transformer neural network, similar to a large language model. Gato only predicts action and text targets with a masked loss function.

Deploying Gato

When deploying Gato, a prompt or demonstration is tokenized to form the initial sequence. The environment provides the first observation, which is also tokenized and added to the sequence. Gato then samples the action vector one token at a time, based on its context. This process continues until the action vector is fully determined. The model always considers previous observations and actions within a context window of 1024 tokens.

Gato’s Training Datasets

Gato is trained on a diverse range of datasets, including agent experiences in simulated and real-world environments, as well as natural language and image datasets. The performance of the pretrained Gato model surpasses expert scores in various domains, as depicted in the bar plot.

The pretrained Gato model with the same weights can perform various tasks, such as image captioning, interactive dialogue, and robot arm control. These tasks demonstrate the versatility and capabilities of Gato as an AI agent.

Source link

Introducing Gato: A Versatile AI Agent

Training Phase of Gato

Deploying Gato

Gato’s Training Datasets

LEAVE A REPLY Cancel reply