The Importance of Device-Directed Speech Detection

Device-directed speech detection (DDSD) plays a crucial role in distinguishing between queries aimed at voice assistants and background conversation or noise. Cutting-edge DDSD systems rely on verbal cues such as acoustic, text, and automatic speech recognition (ASR) features to classify speech accurately. However, these systems often face the challenge of missing modalities when deployed in real-world scenarios.

Improving Robustness with Fusion Schemes

In this research paper, we delve into fusion schemes that enhance the robustness of DDSD systems in the face of missing modalities. By combining the scores and embeddings from prosody (non-verbal cues) with the corresponding verbal cues, we explore various approaches. Our findings reveal that incorporating prosody features can elevate DDSD performance by up to 8.5% in terms of false acceptance rate (FA), creating a more reliable system at a given fixed operating point.

Enhancing Performance with Modality Dropout Techniques

Additionally, we investigate the use of modality dropout techniques to further enhance the performance of DDSD models. These techniques improve the models’ ability to handle missing modalities during inference time. Our evaluation demonstrates that implementing modality dropout techniques results in a 7.4% decrease in false acceptance rate (FA), solidifying the effectiveness of these models.

Overall, our research highlights the significance of fusion schemes and modality dropout techniques in improving DDSD systems’ resilience to missing modalities. By incorporating prosody features and implementing modality dropout, we can achieve more accurate and reliable device-directed speech detection.

Source link

The Importance of Device-Directed Speech Detection

Improving Robustness with Fusion Schemes

Enhancing Performance with Modality Dropout Techniques

LEAVE A REPLY Cancel reply