Multitask Prompt Tuning: Efficient Transfer Learning with Pretrained Language Models

Pretrained language models (PLMs) have greatly improved performance in NLP tasks through finetuning. However, traditional full task-specific finetuning is challenging to scale to multiple tasks. To address this, researchers have been exploring “parameter-efficient” methods for model tuning. One such method is prompt tuning (PT), where tunable continuous prompt vectors are added to the input during training. PT learns a limited number of prompt vectors for each task while keeping the PLM settings fixed. However, PT still falls short compared to complete finetuning and requires longer training times.

To overcome these challenges, recent studies propose reusing pretrained prompts from other tasks. These strategies involve training soft prompts on different source tasks and using them as a starting point for finetuning the prompt on a target task. Researchers from the Ohio State University, MIT-IBM Watson AI Lab, and Massachusetts Institute of Technology further develop this idea with multitask prompt tuning (MPT). MPT utilizes multitask data to learn a single prompt that can be efficiently transferred to target tasks.

While the concept of a shared prompt space is straightforward, implementing it can be difficult. It requires understanding the similarities between source tasks while reducing interference. The researchers found that decomposing the soft prompt of each source task into a shared matrix and a task-specific matrix is more effective than simply sharing the prompt matrix across tasks. They use distillation to teach the decomposition from gentle prompts obtained through consistent prompt tuning.

Comprehensive tests on 23 NLP datasets demonstrate that MPT outperforms state-of-the-art prompt transfer techniques. MPT with T5-Base achieves a 16.3% improvement over the vanilla prompt tuning baseline on the SuperGLUE benchmark, using only a fraction of the task-specific prompt parameters. MPT also performs well in few-shot learning scenarios with 4-32 labels per target task.

For more details, you can check out the research paper [link to paper]. Credit goes to the researchers involved in this project. Don’t forget to join our ML SubReddit [link to subreddit], Discord Channel [link to Discord], and subscribe to our email newsletter [link to newsletter] for the latest AI research news and cool projects.

Source link

Stay in the Loop

Get the daily email from AI Headliner that makes reading the news actually enjoyable. Join our mailing list to stay in the loop to stay informed, for free.

Latest stories

You might also like...