Natural language processing is an area where AI systems are rapidly advancing. To ensure their safety, it is crucial to rigorously test and guide these models. Previous evaluation metrics focused mainly on language comprehension and reasoning abilities. However, as models are now being used for actual interactive work, benchmarks need to be developed to assess their performance in social settings.
One proposed benchmark by the University of California, Center For AI Safety, Carnegie Mellon University, and Yale University is called MACHIAVELLI. This benchmark evaluates an agent’s competence and harmfulness in naturalistic social settings. It is inspired by text-based Choose Your Own Adventure games, which are developed by humans. These games require agents to have planning abilities and a grasp of natural language.
The benchmark includes mathematical formulas to measure certain behaviors, annotations of social notions in the games, and numerical scores for each behavior. The team has found that GPT-4, developed by OpenAI in 2023, is more effective than human annotators in collecting annotations.
AI agents, like humans, can exhibit immoral and power-seeking behaviors. Language models trained for next-token prediction may produce toxic text, while agents trained for goal optimization may engage in harmful activities. Moral training and behavioral regularization have been shown to reduce immoral and harmful behavior in language-model agents without significantly decreasing rewards. This work is important for the development of trustworthy AI decision-makers.
To control agents, techniques such as artificial conscience and ethics prompts are being explored. While progress has been made, there is still more research needed to address these trade-offs and expand the Pareto frontier.
Overall, the development of benchmarks like MACHIAVELLI and strategies to encourage moral behavior in AI agents are crucial for ensuring the safe and responsible use of AI technology.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
🚀 Check Out 100’s AI Tools in AI Tools Club