Home AI News Investigating the Role of Reward in Reinforcement Learning: Expressivity and Task Capture

Investigating the Role of Reward in Reinforcement Learning: Expressivity and Task Capture


Reward: The Driving Force for Reinforcement Learning Agents

In the world of artificial intelligence (AI), the concept of reward plays a crucial role as the driving force for reinforcement learning (RL) agents. It is often assumed that reward is a general and expressive measure of goals and purposes in RL. However, in our work, we aim to systematically study this hypothesis and explore its limitations.

To begin our study, let’s consider a thought experiment involving two characters: Alice, a designer, and Bob, a learning agent. Alice thinks of a task that she wants Bob to learn to solve. This task could be described in natural language, imagined as a state of affairs, or represented as a reward or value function. Alice then translates this task into a generator that provides a learning signal (such as reward) to Bob throughout his lifetime. Our main question is, given Alice’s chosen task, can we always find a reward function that conveys this task to Bob?

To make our study more concrete, we focus on three types of tasks: a set of acceptable policies (SOAP), a policy order (PO), and a trajectory order (TO). These task types represent different instances of tasks we want an agent to learn to solve. We investigate whether reward can capture each of these task types within finite environments. Specifically, we examine whether there is a Markov reward function that depends only on the state space and can capture the task.

Our first main result reveals that for each of the three task types, there are environment-task pairs for which no Markov reward function can capture the task. For example, in a typical grid world, the task of “going all the way around the grid clockwise or counterclockwise” cannot be captured by a Markov reward function. This is because the optimality of a specific action depends on the agent’s past actions, which a Markov reward function cannot convey.

Moving on to our second main result, we investigate whether there is an efficient procedure for determining whether a given task can be captured by reward in a specific environment. If such a reward function exists, we also aim to output it. Our second result shows that for any finite environment-task pair, there is a procedure that can decide whether the task can be captured by Markov reward and generate the desired reward function if it exists.

While this work provides initial insights into the reward hypothesis, there is still much to explore. Generalizing these results beyond finite environments, Markov rewards, and simple task definitions is an important next step. We hope that this study offers new perspectives on the role of reward in reinforcement learning and contributes to the advancement of AI.

Source link


Please enter your comment!
Please enter your name here