Agents Improve Cooperation through Communication and Negotiation
Communication and cooperation have played crucial roles in the advancement of societies throughout history. The board game Diplomacy provides a valuable model for studying these interactions and learning from them. In our recently published paper in Nature Communications, we explore how artificial agents can benefit from communication in Diplomacy, a game known for its focus on alliance building.
Diplomacy is a challenging game with simple rules but complex dynamics due to interdependencies between players and a wide range of possible actions. To address this challenge, we developed negotiation algorithms that allow agents to communicate and agree on joint plans. This enables them to outperform agents without communication abilities.
Cooperation becomes even more challenging when we can’t trust our peers to keep their promises. Diplomacy allows us to study the effects of broken agreements. Our research highlights the risks associated with agents misrepresenting their intentions and misleading others. It also raises the question of how we can promote trustworthy communication and teamwork.
We discovered that sanctioning agents who break contracts significantly reduces the advantages they gain from abandoning their commitments. This fosters more honest communication and cooperation among agents.
What is Diplomacy and why is it important?
Diplomacy is a negotiation-based game played by seven players on a map of Europe divided into provinces. It involves strategic decision-making and alliance formation. The negotiation phase, where players discuss their next moves, is at the heart of the game. Computational approaches to Diplomacy have been studied since the 1980s, with various protocols proposed to facilitate negotiation between computer agents.
What did we study?
We used Diplomacy as a tool to study real-world negotiation scenarios and developed methods for AI agents to coordinate their moves through communication. We enhanced non-communicating agents to play Diplomacy with communication abilities. These augmented agents, called Baseline Negotiators, operate based on negotiated contracts.
We explored two negotiation protocols: the Mutual Proposal Protocol and the Propose-Choose Protocol. Our agents utilized algorithms to identify mutually beneficial deals by simulating possible outcomes under different contracts. We used the Nash Bargaining Solution from game theory as a basis for high-quality agreements. Monte Carlo simulations helped predict future states based on agreed contracts.
Our experiments demonstrated that the negotiation mechanism significantly improved the performance of Baseline Negotiators compared to non-communicating agents.
Agents breaking agreements
In Diplomacy, agreements made during negotiation are not binding. We examined the consequences of agents deviating from agreed contracts in subsequent turns. This mirrors real-life scenarios where people fail to meet their commitments. To enable cooperation between AI agents or between agents and humans, we must address the problem of strategic agreement-breaking.
We introduced Deviator Agents, which outperformed honest Baseline Negotiators by breaking agreements. Simple Deviators “forgot” they agreed to a contract and acted independently. Conditional Deviators optimized their actions assuming other players would adhere to the contract.
We found that both Simple and Conditional Deviators had a significant advantage over Baseline Negotiators, with Conditional Deviators having the highest advantage.
Encouraging honest behavior
To address the deviation problem, we introduced Defensive Agents that responded negatively to deviations. We explored Binary Negotiators and Sanctioning Agents as examples of Defensive Agents. Binary Negotiators simply cut off communication with agents who broke agreements, while Sanctioning Agents actively sought to lower the value of deviators. We observed that both types of Defensive Agents reduced the advantage of deviation, especially Sanctioning Agents.
Finally, we introduced Learned Deviators, who adapted their behavior over multiple games to make the above defenses less effective. Although Learned Deviators occasionally broke contracts late in the game, they still adhered to more than 99.7% of their agreements. However, issues such as trust erosion and the cost of deviating may require additional mechanisms like reputation systems.
Future research
Our study raises several questions for future research. Can we design more sophisticated protocols to encourage even more honest behavior? How can we combine communication techniques with imperfect information? What other mechanisms can deter the breaking of agreements? Building fair, transparent, and trustworthy AI systems is a vital part of our mission at DeepMind. Conducting studies on sandboxes like Diplomacy helps us address these important topics.