Empowering users to teach robots new tasks is essential for integrating them into real-world applications. Whether it’s teaching a robot dog a new trick or instructing a manipulator robot to organize a lunch box, language models trained on internet data offer a promising solution. These models have been used to facilitate step-by-step planning, goal-oriented dialogue, and even robot-code-writing agents. However, they struggle to generate low-level robot commands due to the lack of relevant training data.
In our article “Language to Rewards for Robotic Skill Synthesis,” we introduce an approach that enables users to teach robots new actions using natural language input. We utilize reward functions as an interface to bridge the gap between language and low-level robot actions. Reward functions offer rich semantics, modularity, and interpretability, making them an ideal choice for this task. They also connect directly to low-level policies through optimization or reinforcement learning.
Our language-to-reward system consists of two main components: the Reward Translator and the Motion Controller. The Reward Translator converts user instructions into reward functions represented as Python code. This module is divided into the Motion Descriptor and the Reward Coder. The Motion Descriptor interprets user input and expands it into a more specific description of the desired robot motion. This helps to stabilize the reward coding task and makes it more interpretable for users. The Reward Coder then translates the generated motion description into a reward function using the same language model. We pre-define reward terms and guide the model to generate the correct reward function for the task.
The Motion Controller takes the reward function generated by the Reward Translator and synthesizes a controller that maps robot observation to low-level actions. We approach this problem as a Markov decision process and solve it using MuJoCo MPC. This open-source tool has been successful in creating diverse behaviors and supports various planning algorithms.
We provide examples of applying our language-to-reward system to a simulated quadruped robot and a dexterous manipulator robot. The system successfully teaches these robots new skills and performs manipulation tasks. We also validate our method on a real-world manipulation robot.
In conclusion, our approach leverages the power of language models and reward functions to enable users to teach robots new actions. By bridging the gap between language and low-level actions, we create a more intuitive and flexible system for robotic skill synthesis. With further improvements, such as structured motion description templates, we can tap into the internal knowledge of language models and enhance the performance of our system.