Home AI News Unlocking ASR Uncertainty: Enhancing LLM Performance for Speech Understanding

Unlocking ASR Uncertainty: Enhancing LLM Performance for Speech Understanding

0
Unlocking ASR Uncertainty: Enhancing LLM Performance for Speech Understanding

Large Language Models (LLMs): How N-Best List Prompts Can Improve Speech Intent Classification

Large language models (LLMs) are powerful tools for natural language processing (NLP) tasks. However, when it comes to spoken language understanding (SLU) tasks, they can struggle. In this article, we explore how using n-best list prompts can help improve speech intent classification with LLMs.

### The Challenge of Spoken Language Understanding

When it comes to understanding speech, LLMs need to rely on speech-to-text conversion from an off-the-shelf automation speech recognition (ASR) system. The accuracy of the LLM on SLU tasks is constrained by the accuracy of the ASR system on the speech input. High word-error-rate (WER) can mean that the LLM doesn’t have the correct information to understand the spoken intent.

### Using N-Best List Prompts to Improve Speech Intent Classification

To address this problem, the authors propose using n-best list prompts to prompt the LLM. They explore using descriptive prompts to explain the concept of n-best lists to the LLM, and then finetuning LoRA adapters on the intent classification task.

### The Efficacy of N-Best List Prompts

The authors demonstrate the effectiveness of their approach on a binary device-directed speech detection task and a keyword spotting task on the Google speech commands dataset. Their findings show that systems using n-best list prompts outperform those using 1-best ASR outputs, paving the way for a more efficient method to exploit ASR uncertainty with LLMs for speech-based applications.

Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here