MotionLM: Predicting the Behavior of Road Agents using Autoregressive Language Models
Autoregressive language models are powerful tools that can predict the next subword in a sentence without relying on predefined grammar rules. These models have been successfully applied to various data domains such as audio and image production, where data is represented as discrete tokens. Sequence models, like autoregressive language models, have gained attention for their ability to handle complex and dynamic contexts, such as predicting the behavior of road agents.
The Challenge of Predicting Road Agent Behavior
Road users can be thought of as participants in a continuous conversation, exchanging actions and replies. Similar to how language models capture complex language distributions in conversations, researchers have explored the use of sequence models to forecast the behavior of road agents. However, current methods that decompose the distribution of agent behavior into independent per-agent marginal distributions have limitations. They fail to consider the interactions between multiple agents, resulting in unpredictable forecasts.
MotionLM: A Language Modeling Approach
To address the limitations of existing methods, researchers from Waymo have developed MotionLM, a unique approach for predicting the future behavior of road agents. MotionLM treats the task of predicting multiple-road agent motion as a language modeling problem. The goal is to create phrases in a “language” composed of the actions of road agents.
Unlike other methods that rely on anchors or complicated optimization procedures, MotionLM adopts a simple language modeling objective. It maximizes the average log probability of correctly anticipating the sequence of motion tokens. This simplicity makes the model more accessible and easier to train.
Unlike the two-step procedure used by other methods, MotionLM directly constructs joint distributions over the future actions of multiple agents using autoregressive decoding. This integration of interaction modeling is more effective and seamless. MotionLM’s sequential factorization allows for the rollout of temporally causal conditionals, resulting in more realistic and accurate predictions about future agent behavior.
Evaluation and Conclusions
MotionLM has been evaluated using the Waymo Open Motion Dataset and has performed exceptionally well in predicting the actions of road agents in challenging situations. It outperformed other approaches in the interactive challenge and demonstrated its effectiveness in multi-agent motion prediction for autonomous vehicles. MotionLM is a significant advancement in the field and contributes to safe planning in autonomous vehicles.