Improving Virtual Assistant Interactions: Dropping Trigger Phrases
In the world of artificial intelligence (AI), virtual assistants have become increasingly popular. These assistants are typically activated by a user saying a trigger phrase before issuing a command. However, a team of researchers is exploring whether it is possible to eliminate the need for trigger phrases, making interactions with virtual assistants more natural.
Combining Signals for Natural Interactions
The researchers combined the decoder signals of an automatic speech recognition (ASR) system with acoustic and lexical representations as input features to a large language model (LLM). This innovative approach aimed to create a more seamless and natural interaction with virtual assistants.
Efficiency and Effectiveness
The team focused on developing a data- and resource-efficient system, with the goal of minimizing the amount of training data required and ensuring compatibility with devices such as smartphones. Their model was fine-tuned on a small amount of multimodal data using low-rank adaptation, allowing for better performance with minimal resources. The results of the study showed that the multimodal system outperformed unimodal baselines, utilizing only a fraction of the training data.