Two key questions need to be addressed in artificial intelligence research: What are the desired capabilities of AI systems? And how do we determine if we are making progress towards those goals? According to Alan Turing, the answers to these questions might be intertwined. If an AI system’s behavior resembles human-like intelligence during interactions with people, then it can be deemed intelligent. To achieve this, AI systems must be capable of interacting with humans and assisting them in various tasks.
Creating AI agents that can effectively interact with humans and the world is a challenging task. We need to find ways to teach artificial agents these abilities and evaluate their performance in ambiguous and abstract language context. To tackle this, we have developed a simulated environment called the Playroom. In the Playroom, virtual robots can engage in different interactions, such as moving around, manipulating objects, and communicating with each other. This environment enables us to study joint intentionality, cooperation, and other social aspects of interaction.
We employ various learning techniques, including imitation learning, reinforcement learning, supervised learning, and unsupervised learning, to build agents capable of interacting with humans. By imitating human behavior and collecting data on human interactions, we can train AI agents to interact with textual language and perform tasks. However, since there is no readily available data source for grounded language interactions, we created a system to elicit interactions from human participants.
Our agents receive input in the form of images and language and produce physical actions and language responses as outputs. We have integrated imitation learning, reinforcement learning, and auxiliary learning methods to train our agents through interactive self-play. This allows our agents to follow commands and answer questions. They can also provide commands and ask questions to improve other agents’ performance.
Evaluating the performance of our agents is not straightforward due to the abstract and ambiguous nature of language and the shared physical environment. We have developed various evaluation methods, including large-scale trials with human interactions, to diagnose problems and assess agent performance.
Human evaluators have tested both agents and other humans in completing instructions and answering questions in the Playroom. Initially, randomly initialized agents had a success rate of approximately 0%. With supervised learning and semi-supervised auxiliary tasks, the success rate improved to around 10-20%. However, agents trained with a combination of supervised, semi-supervised, and reinforcement learning using interactive self-play achieved the best performance.
Our setting offers the advantage of virtually unlimited tasks that can be set via language, allowing us to assess the competencies of our agents quickly. While there are still tasks they struggle with, our approach provides a clear path for enhancing AI competencies in complex environments and interactions with people. These methods are widely applicable wherever there is a need for agents that can interact with humans in complex environments.