Large Language Models (LLMs): How N-Best List Prompts Can Improve Speech Intent Classification
Large language models (LLMs) are powerful tools for natural language processing (NLP) tasks. However, when it comes to spoken language understanding (SLU) tasks, they can struggle. In this article, we explore how using n-best list prompts can help improve speech intent classification with LLMs.
### The Challenge of Spoken Language Understanding
When it comes to understanding speech, LLMs need to rely on speech-to-text conversion from an off-the-shelf automation speech recognition (ASR) system. The accuracy of the LLM on SLU tasks is constrained by the accuracy of the ASR system on the speech input. High word-error-rate (WER) can mean that the LLM doesn’t have the correct information to understand the spoken intent.
### Using N-Best List Prompts to Improve Speech Intent Classification
To address this problem, the authors propose using n-best list prompts to prompt the LLM. They explore using descriptive prompts to explain the concept of n-best lists to the LLM, and then finetuning LoRA adapters on the intent classification task.
### The Efficacy of N-Best List Prompts
The authors demonstrate the effectiveness of their approach on a binary device-directed speech detection task and a keyword spotting task on the Google speech commands dataset. Their findings show that systems using n-best list prompts outperform those using 1-best ASR outputs, paving the way for a more efficient method to exploit ASR uncertainty with LLMs for speech-based applications.