Home AI News Revolutionizing Voice Triggering: Multichannel Acoustic Models at ICASSP 2024

Revolutionizing Voice Triggering: Multichannel Acoustic Models at ICASSP 2024

0
Revolutionizing Voice Triggering: Multichannel Acoustic Models at ICASSP 2024

Voice Triggering Workshop at ICASSP 2024

A recent paper accepted at the HSCMA workshop at ICASSP 2024 discusses the importance of voice triggering (VT) in enabling users to activate their devices through a trigger phrase. Typically, a front-end system is used for speech enhancement and/or separation, producing multiple enhanced and/or separated signals. However, conventional VT systems only take single-channel audio as input, leading to the discarding of potentially useful information in unselected channels.

Multichannel Acoustic Models

The paper introduces multichannel acoustic models for VT, where the output from the front-end is fed directly into a VT model. By incorporating a transform-average-concatenate (TAC) block and modifying it to include channel information from conventional channel selection, the model can effectively focus on a target speaker in the presence of multiple speakers. This approach has shown a significant 30% reduction in the false rejection rate compared to the baseline channel selection method.

Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here