The Compositional Nature of Intelligence
Imagine having to learn how to chop, peel, and stir every time you wanted to try a new recipe. In machine learning systems, agents often have to start from scratch when faced with new challenges. Humans, on the other hand, can combine skills they already have to tackle new tasks more efficiently. They repurpose and recombine abilities they’ve already learned, similar to how words can be reassembled into different sentences. This ability to leverage previous knowledge is something we want to bring to our AI agents.
Reinforcement learning (RL) is a concept that captures how animals learn by exploring and interacting with their environment to obtain rewards. By combining RL with deep learning, we’ve seen impressive results, such as AI agents that can master complex boardgames and video games. However, RL has a major limitation – it requires extensive training experience. For example, an RL agent needs weeks of uninterrupted playing to learn how to play a single Atari game. In contrast, humans can reach the same level of performance in just fifteen minutes.
One possible reason for this difference is that RL agents usually learn from scratch, whereas humans can build on previous knowledge. We want our agents to be able to use the knowledge they already have to learn new tasks faster. Building on this idea, we’ve developed a framework described in a recent article in the Proceedings of the National Academy of Sciences (PNAS).
To illustrate our approach, let’s consider the example of a daily commute to work. The agent has two options, one with great coffee but a longer path and another with decent coffee but a shorter commute. Traditionally, RL algorithms fall into two categories: model-based and model-free agents. A model-based agent builds a comprehensive representation of the environment, considering factors like coffee quality and commute distance. In contrast, a model-free agent has a more compact representation, with a single value associated with each possible route.
However, preferences can change from day to day, and a model-free agent would need to learn the best route for every combination of preferences, which is time-consuming and impossible to cover all possible scenarios. On the other hand, a model-based agent can adapt to any set of preferences without learning, but mental calculation and building a model of the entire world can be challenging.
We propose an intermediate solution called successor features. This approach combines the best of both worlds by summarizing different quantities describing the world, similar to how humans make decisions. In our commute example, an RL agent with successor features would have numbers representing coffee quality, commute distance, and even other factors like food quality for future reference. This representation captures the aspects the agent cares about, known as “features.”
Successor features provide a middle ground between the model-free and model-based representations. They summarize multiple quantities and allow the agent to adapt to different preferences without extensive learning or computational demands. This concept is based on recent studies in behavioral science and neuroscience, which suggest that humans also make decisions using a similar algorithmic model.
By incorporating successor features into our RL agents, we aim to enhance their ability to learn new tasks quickly and efficiently. This framework brings us closer to replicating the compositional nature of human intelligence in AI systems.