DeepNash Masters Stratego with Game Theory and Model-Free Deep RL
The world of game-playing artificial intelligence (AI) has reached a new milestone with DeepNash, an AI agent that has learned Stratego, the complex board game, to a human expert level. Published in Science, DeepNash uses a unique approach combining game theory and model-free deep reinforcement learning. Its play style converges to a Nash equilibrium, making it difficult for opponents to exploit. DeepNash has even achieved a top-three ranking among human experts on the famous online Stratego platform, Gravon.
Board games like Stratego provide an opportunity to study strategic interactions between humans and machines in a controlled environment. Unlike chess and Go, Stratego is a game of imperfect information, where players cannot directly see their opponent’s pieces. This has posed a challenge for AI-based Stratego systems. DeepNash goes beyond traditional AI techniques like game tree search to master this game of imperfect information.
Mastering Stratego has broader implications beyond gaming. As we strive to build advanced AI systems that can operate in real-world situations with limited information, DeepNash demonstrates how it can be applied to solve complex problems.
A Closer Look at Stratego
Stratego is a turn-based game that involves capturing the opponent’s flag. It requires bluffing, tactics, information gathering, and careful maneuvering. The game is a zero-sum game, meaning any gain by one player results in an equal loss for the opponent.
One of the challenges in developing AI for Stratego is its imperfect information nature. Both players arrange their 40 pieces in a hidden formation, and the identity of the pieces is only revealed when they come into contact on the battlefield. This makes Stratego more similar to poker than chess or Go. AI techniques that work for perfect information games, like DeepMind’s AlphaZero, do not easily transfer to Stratego.
The unique characteristics of Stratego, including its length, the need to reason over sequential actions, and the vast number of possible game states, have made it a decades-long challenge for the AI community.
DeepNash’s Novel Approach
DeepNash uses a combination of game theory and model-free deep reinforcement learning. Unlike traditional AI approaches, DeepNash does not explicitly model its opponent’s private game-state, especially in the early stages when little is known about the opponent’s pieces. The game tree complexity of Stratego is too vast for traditional AI techniques like Monte Carlo tree search.
Instead, DeepNash uses a game-theoretic algorithm called Regularised Nash Dynamics (R-NaD) to achieve a Nash equilibrium. Playing according to a Nash equilibrium makes DeepNash unexploitable over time. In matches against Stratego bots and expert human players, DeepNash achieved impressive win rates, demonstrating its effectiveness.
Unexpected Strategies of DeepNash
DeepNash developed unexpected strategies to become hard to exploit. It varied its initial deployments to prevent opponents from detecting patterns. During gameplay, DeepNash randomized between seemingly equivalent actions to avoid exploitable tendencies.
The AI agent also demonstrated the value it placed on information by making sacrificial moves to gain intel on its opponent’s powerful pieces. It employed bluffing tactics to represent strength when weak, similar to poker. These strategies showed DeepNash’s ability to adapt and make decisions with imperfect information.
Conclusion
DeepNash’s mastery of Stratego showcases the power of combining game theory and model-free deep reinforcement learning. This achievement has implications not only for gaming but also for solving complex real-world problems with limited information. With its unpredictable strategies and ability to balance outcomes, DeepNash opens up new possibilities for AI systems in uncertain situations.