Simple and Effective Interaction with Quadrupedal Robots: SayTap
In the world of AI, one of the key challenges involved in human-robot interactions is enabling robots to understand and respond to natural language commands. While large language models (LLMs) have shown promise in high-level planning, comprehending low-level commands like joint angles or motor torques proves to be difficult, especially for legged robots that require frequent control signals.
To address this challenge, we introduce SayTap: Language to Quadrupedal Locomotion, an approach that utilizes foot contact patterns as an interface between natural language commands and a locomotion controller. By incorporating foot contact patterns into the interaction system, users can effortlessly command the robot to perform various locomotion behaviors, such as walking, running, jumping, and more, using simple language.
The SayTap method employs a contact pattern template, a matrix that represents the sequence of foot placements while the robot is in motion. This template serves as an input to the locomotion controller, which generates low-level commands for the robot to achieve the desired foot contact patterns. To train the locomotion controller, we utilize deep reinforcement learning and represent it as a deep neural network.
The remarkable aspect of SayTap is its ability to accurately translate user commands into foot contact patterns, even when the commands are unstructured or vague. We achieve this through the use of well-designed prompts that guide the language models to understand the desired actions. These prompts consist of general instructions, gait definitions, output format definitions, and examples that provide context.
With just three in-context examples and a little guidance, an LLM can accurately map human commands into contact patterns and even generalize to commands that do not explicitly specify how the robot should react. Our method allows robots to follow both simple and direct instructions, as well as vague commands based on emotions or general context.
In demonstration videos, our SayTap system successfully performs tasks with clear and direct commands. Furthermore, it showcases its ability to process unstructured and vague instructions, where the robot reacts based on hints provided in the prompt.
SayTap is an exciting advancement in the field of human-robot interactions, enabling robots to understand and respond to natural language commands effortlessly. By using foot contact patterns as an interface, we open up new possibilities for the capabilities and applications of intelligent helper robots. Whether it’s walking, running, or jumping, robots can now learn to move in a way that enhances our lives and brings us closer to a future where technology knows no bounds.