Automating Reward Function Generation with Vision-Language Models
Researchers at Google DeepMind and Mila, McGill University, have developed a new method to efficiently train reinforcement learning (RL) agents by automating the process of generating reward functions. Reward functions are crucial in reinforcement learning as they determine the desired behaviors for the agents.
Automating Manual Processes with VLM-CaR
Traditionally, designing reward functions for RL agents has been a manual and time-consuming task that requires domain expertise. The new framework called Code as Reward (VLM-CaR) leverages pre-trained Vision-Language Models (VLMs) to automatically generate dense reward functions for RL agents. This automation reduces the computational burden and provides accurate and interpretable rewards derived from visual inputs, making the training process more efficient.
How VLM-CaR Works
VLM-CaR operates in three stages: generating programs, verifying programs, and RL training. In the first stage, VLMs are used to describe tasks and sub-tasks based on initial and goal images of an environment, producing executable computer programs. These programs are then verified for correctness using expert and random trajectories before acting as reward functions for training RL agents. This approach enables efficient training in environments with sparse or unavailable rewards.
In conclusion, the VLM-CaR framework offers a systematic solution to the manual process of defining reward functions for RL agents. By automating this process, researchers hope to improve the training efficiency and performance of RL agents across various environments.