Title: Enhancing Language Models with Directional Stimulus Prompting
In recent years, Natural Language Processing (NLP) has undergone a significant transformation with the introduction of Large Language Models (LLMs), such as GPT-2 and T5 Raffel et al. These models have outperformed smaller Language Models (LMs) in various NLP tasks. However, there is still room for improvement, especially when it comes to specific downstream tasks. This has led researchers at the University of California, Santa Barbara, and Microsoft to propose a new architecture called Directional Stimulus Prompting (DSP) that aims to enhance the performance of LLMs on these tasks.
What is Directional Stimulus Prompting (DSP)?
DSP utilizes a tiny tuneable Language Model (LM) called a policy LM, along with the frozen black-box LLM. The policy LM generates a directed stimulus, such as keywords, that provides specific instructions or information about the input sample. This stimulus is then combined with the original input and given to the LLM, guiding its generation towards the desired outcome. The policy LM is optimized using supervised finetuning (SFT) with a small set of training samples to maximize the reward, which is based on the downstream performance measures of the LLM.
How does DSP work?
The stimulus generated by the policy LM acts as hints for the LLM to produce the desired output. While LLMs have excellent generation skills, they may exhibit unwanted behaviors without fine-grained guidance. The role of the policy LM is to provide this fine-grained guidance by generating directed stimuli. Unlike prior studies that focus on prompt engineering/optimization, DSP leverages Reinforcement Learning (RL) to bridge the gap between the optimized policy LM and the optimization objective defined by the LLM generation.
Evaluation and Results:
The researchers evaluated DSP on summarization and dialogue response generation tasks using the 750M Flan-T5-large as the policy LM and the 175B Codex as the LLM. The results showed that using DSP significantly improved Codex’s performance on these tasks. For summarization tasks, keywords were used as directing stimuli, and Codex’s performance increased by 7.2% when guided by the policy LM. For dialogue response generation, the policy LM produced dialogue actions that led to a 52.5% increase in Codex’s total scores compared to earlier systems trained with complete training data.
The Directional Stimulus Prompting (DSP) architecture offers a promising approach to enhance the performance of Language Models (LMs) on specific downstream tasks. By leveraging a tiny tuneable model as a policy LM and employing Reinforcement Learning (RL), DSP provides fine-grained guidance to LMs, resulting in improved performance. These findings have implications for various NLP applications and highlight the potential of DSP in optimizing LLMs.
Note: Join our ML SubReddit, Discord Channel, and Email Newsletter for the latest AI research news, projects, and more. Aneesh Tickoo, an undergraduate student at IIT Bhilai, contributed to this research as a consulting intern at MarktechPost.