Improving Multi-Turn Conversations: Dynamic Planning with Reinforcement Learning

AI News

Improving Multi-Turn Conversations: Dynamic Planning with Reinforcement Learning

Jimmy W.

June 28, 2023

Improving Multi-Turn Conversations: Dynamic Planning with Reinforcement Learning

Dynamic Planning: Enabling Engaging Conversations with Virtual Assistants

Virtual assistants have become a common part of our lives, helping us learn new things and providing recommendations. However, users now expect more than just simple, short dialogues with these assistants. They want deeper, multi-turn interactions that feel engaging and natural. That’s where dynamic planning comes in.

Dynamic planning is the ability of an assistant to adjust its conversation plan based on the flow of the dialogue. It allows the assistant to look ahead and modify its original plan to keep the conversation interesting and relevant. While large language models (LLMs) have made significant progress in natural language processing, they are not designed for dynamic planning. That’s where reinforcement learning (RL) comes in.

RL has been successful in solving problems that involve dynamic planning, such as winning games and protein folding. So, we applied RL to enable dynamic planning in human-to-assistant conversations. Our approach allows the assistant to plan a multi-turn conversation towards a goal and adapt that plan in real-time. This way, the assistant can deliver engaging, open-ended interactions that users expect.

Improving Long Interactions with Reliable Information

To improve long interactions, we focused on using reliable information from reputable sources instead of relying solely on content generated by a language model. This approach ensures that the assistant can provide accurate and trustworthy answers. We used RL to compose answers based on the information extracted from these sources.

Dynamic Composition: Giving Assistants More Control

To tackle the challenge of conversational exploration, we developed a two-part approach called dynamic composition. The first part involves extracting relevant information from reputable sources, and the second part involves combining this information into assistant responses. Unlike traditional LLM methods, dynamic composition allows the assistant to have control over the source, correctness, and quality of the content it offers.

Training the Dialogue Manager with RL

We trained a dialogue manager using off-policy RL, which evaluates and improves a policy that is different from the one used by the assistant. This allows the assistant to adapt and make decisions based on the conversation history. We addressed the challenges of a large state space and a vast action space by using powerful recurrent neural networks and transformers to represent the dialogue state effectively. We also limited the action space to reasonable candidate utterances or actions generated by content providers.

Evaluation and Results

We compared our RL-based dialogue manager with a supervised transformer model in an experiment using Google Assistant. The RL model conducted longer and more engaging conversations, increasing conversation length by 30% and improving user engagement metrics. Users were more cooperative in their responses to the assistant’s questions, and explicit positive feedback increased by 32% while negative feedback reduced by 18%.

The Future of Dynamic Planning

We believe that combining LLMs and RL in multi-turn dialogues will further enhance the capabilities of virtual assistants. By enabling dynamic planning, assistants can provide engaging and natural conversations that meet the expectations of users. Our research demonstrates the potential of RL in dialogue management and opens up new possibilities for the future of virtual assistants.

In conclusion, dynamic planning is a crucial ingredient for creating engaging conversations with virtual assistants. By using reinforcement learning and reliable information from reputable sources, we can empower assistants to plan and adapt their conversations in real-time. This approach improves the length and quality of interactions, ultimately enhancing the user experience.

Source link

LEAVE A REPLY Cancel reply