Using Deep Reinforcement Learning to Align AI Systems with Human Values: A Proof-of-Concept Study
A recent article published in Nature Human Behaviour presents an exciting demonstration of how deep reinforcement learning (RL) can help train AI systems to find economic policies that align with human values. This study addresses a critical challenge in AI research, emphasizing the need to develop AI systems that are in harmony with our values and beliefs.
The Challenge of Resource Redistribution
Imagine a scenario where a group of individuals decides to pool their funds to make an investment. The investment turns out to be profitable, resulting in a financial gain. Now comes the question of how the proceeds should be distributed among the participants. One simple approach could be to divide the returns equally, but this may not be fair as some individuals may have contributed more than others. On the other hand, distributing the funds in proportion to the initial investment may also be seen as unfair since people may have different levels of assets to begin with. The dilemma of redistributing resources in our economies and societies has long been a subject of debate among philosophers, economists, and political scientists.
Exploring Solutions with Deep RL
To address this challenge, the researchers designed a simple game involving four players and conducted a series of experiments. Each game consisted of ten rounds, during which each player received a certain amount of funds. The players had two choices: keep the funds for themselves or invest in a common pool. Investing offered the potential for growth, but the players didn’t know how the proceeds would be distributed. They were informed that one referee (A) would make redistribution decisions for the first ten rounds, while a different referee (B) would take over for the following ten rounds. At the end of the game, the players voted for either Referee A or Referee B, and another game was played with the chosen referee. The players were motivated to report their preference accurately since they would keep the proceeds of this final game.
Training the AI Agent
One of the referees was a pre-defined redistribution policy, while the other was designed by a deep RL agent. The researchers trained the agent by gathering data from numerous human groups and teaching a neural network to mimic their gameplay. This simulated population generated extensive data, which allowed the researchers to employ data-intensive machine learning techniques to train the RL agent. The goal was to maximize the votes of these virtual players. Once the AI agent was trained, new human players participated, and the AI-designed mechanism was compared against well-known baselines, including a libertarian policy that redistributed funds in proportion to contributions.
Results and Implications
The analysis of the new players’ votes revealed that the policy created by deep RL was more popular than the baselines. Surprisingly, when a fifth human player assumed the role of referee, trained to maximize votes, their policy was still less favored than the AI agent’s policy. This study demonstrates that AI can be trained to maximize the stated preferences of a group of people, reducing the risk of AI systems adopting unsafe or unfair policies.
Closer examination of the AI-developed policy revealed that it combined various ideas proposed by human thinkers and experts to address the redistribution problem. The AI system opted to redistribute funds based on relative rather than absolute contributions, considering each player’s initial means and willingness to contribute. Moreover, the AI rewarded players who made more generous relative contributions, possibly encouraging others to do the same. Importantly, these policies were discovered through learning to maximize human votes, emphasizing the crucial role of human involvement in AI decision-making.
By utilizing votes, this study harnesses the principle of majoritarian democracy to determine what people truly desire. While majoritarian democracy has broad appeal, it acknowledges that the preferences of the majority may supersede those of the minority. In this study, the minority consisted of players who had more substantial initial endowments. However, further research is needed to strike a balance between the preferences of majority and minority groups, designing democratic systems that give all voices equal consideration.
This research marks an essential step toward creating AI systems that align with human values, offering exciting possibilities for the future of AI technology.