New Definition of Agency: Modelling Incentives and Behavior of AI Agents
The field of artificial general intelligence (AGI) aims to create safe and aligned AGI systems that align with their intended goals. To better understand agent incentives, causal influence diagrams (CIDs) are used to model decision-making situations. These CIDs allow us to reason about agent behavior by relating training setups to their incentives and potential risks.
In our recent paper, “Discovering Agents,” we introduce new approaches to address these issues, including:
1. The First Formal Causal Definition of Agents: We define agents as systems that would change their behavior if their actions had a different impact on the world.
2. An Algorithm for Discovering Agents from Data: We present an algorithm that uses empirical data to identify agents within a system.
3. A Translation Between Causal Models and CIDs: We establish a method to translate causal models into CIDs, enabling a better understanding of agent incentives.
4. Resolving Confusions in Causal Modelling: We address previous mistakes in causal modeling of agents, providing a more accurate analysis of incentives and safety properties.
To illustrate our method, let’s consider the example of modeling a mouse as an agent. In a world with three squares, the mouse can choose to go left or right, navigate to its next position, and potentially find cheese. The floor is icy, so the mouse may slip, and the cheese may be on either side. This scenario can be represented using a CID.
We also introduce three algorithms that enhance our understanding and analysis of AI agents:
1. Causal Discovery of Agents: This algorithm uses interventions to infer the causal relationships within a system.
2. Transformation to Game Graph: This algorithm maps the mechanized causal graph into a game graph, indicating decisions and utilities.
3. Translation Between Game Graph and Mechanized Causal Graph: This algorithm allows us to convert the game graph back into a mechanized causal graph.
These algorithms enable us to discover and analyze agents within causal experiments, using CIDs as representations.
Our research provides better safety tools for modeling AI agents by introducing the first formal causal definition of agents. Our approach, grounded in causal discovery, allows us to assess whether a system contains an agent. This is crucial for evaluating the risks associated with AGI.
As the interest in causal modeling of AI systems grows, our research demonstrates the potential of our approach in improving safety analysis and understanding the presence of agents within AI systems.
Excited to learn more? Check out our paper. We welcome feedback and comments on our work.