Training Sparrow: Creating Safer and More Helpful AI Dialogue Agents

AI News

Training Sparrow: Creating Safer and More Helpful AI Dialogue Agents

Jimmy W.

June 27, 2023

Training Sparrow: Creating Safer and More Helpful AI Dialogue Agents

Training AI for Safer and More Helpful Communication: Introducing Sparrow

In the world of artificial intelligence, language models have made significant progress in various tasks such as question answering and summarization. However, dialogue agents powered by these models can sometimes provide inaccurate or inappropriate answers, which can be harmful.

To address this issue, researchers have developed Sparrow, a dialogue agent designed to be more helpful, correct, and harmless. In a recent paper, Sparrow is introduced as a research model and proof of concept. Its purpose is to advance our understanding and improve the training of dialogue agents, ultimately contributing to the development of safer and more useful artificial general intelligence.

Our new conversational AI model replies on its own to an initial human prompt.

Exploring the Functionality of Sparrow

Training a conversational AI model is a complex task as success is subjective. To overcome this challenge, researchers implemented a form of reinforcement learning based on human feedback. Study participants were asked to evaluate different model answers to the same question and provide feedback based on their preference. Additionally, answers were presented with or without evidence from the internet to determine when evidence should be used.

We ask study participants to evaluate and interact with Sparrow either naturally or adversarially, continually expanding the dataset used to train Sparrow.

To ensure the safety of the model, initial rules were established to prevent behavior such as making threatening statements or using hateful language. Researchers also focused on identifying harmful advice and avoiding the model pretending to be a person. Study participants were then asked to engage in conversations aimed at tricking the model into breaking these rules. Through this process, a separate “rule model” was trained to detect any rule-breaking behavior.

Sparrow declining to answer a potentially harmful question.

Improving AI Accuracy and Ethical Guidelines

Evaluating the accuracy of Sparrow’s answers is a challenge, even for experts. Instead, participants were asked to determine the plausibility of the answers and whether the evidence provided supports them. According to participant feedback, Sparrow provides plausible answers supported by evidence 78% of the time when asked factual questions, which is an improvement compared to baseline models. However, Sparrow is not perfect and can still make mistakes, such as hallucinating facts or providing off-topic answers.

While Sparrow is more adept at following rules compared to previous models, there is room for improvement. Even after training, participants were able to trick it into breaking rules 8% of the time. Further research is needed to develop a more comprehensive set of rules, involving input from experts, policymakers, social scientists, ethicists, and users from diverse backgrounds.

Sparrow answers a question and follow-up question using evidence, then follows the “Do not pretend to have a human identity” rule when asked a personal question (sample from 9 September, 2022).

Looking Towards the Future

Sparrow represents a significant step towards training dialogue agents that are not only safer but also more useful. However, effective communication between humans and AI should align with human values and avoid harm. Ongoing research focuses on aligning language models with human values to enhance communication. It is important to note that there are contexts in which AI agents should defer to humans or decline to answer questions to prevent harmful behavior. Additionally, future work should ensure similar results in different languages and cultural contexts.

The ultimate goal is to establish a better understanding of AI behavior and allow humans to align and improve these complex systems with the help of machines. By exploring safe and more useful communication, we can pave the way for a future where AI benefits humanity.

Interested in contributing to the development of safe AGI through conversation? Join our team as a research scientist.

Source link

Training AI for Safer and More Helpful Communication: Introducing Sparrow

Exploring the Functionality of Sparrow

Improving AI Accuracy and Ethical Guidelines

Looking Towards the Future

LEAVE A REPLY Cancel reply