Nvidia and researchers from the University of Illinois at Urbana Champaign have developed a new language model called Retro 48B. This model is larger than previous retrieval-augmented models and has been pre-trained with retrieval on a vast corpus, resulting in improved perplexity. The researchers found that removing the encoder in InstructRetro did not affect its performance in question answering, indicating that retrieval-augmented pre-training enhances the decoder’s performance.
Retrieval-augmented language models are well-known for their benefits in open-domain question answering, both during pre-training and inference. They reduce model perplexity, improve factuality, and enhance task performance after fine-tuning. However, existing retrieval-augmented models are limited in size compared to decoder-only models, which restricts their zero-shot generalization potential. Instruction tuning, which relies on high-quality datasets, has shown superior performance in chat and question-answering tasks.
Pretraining language models with retrieval, like Retro, has shown promise in reducing perplexity and improving factual accuracy. However, these models need more parameters and training data to perform well in instruction tuning and other large language model tasks. In this study, the researchers introduce Retro 48B, the largest retrieval-augmented model, and continue to pretrain a 43B GPT model with additional tokens. The resulting model, InstructRetro, significantly improves zero-shot question answering compared to traditional GPT models. Interestingly, the decoder of InstructRetro performs similarly when the encoder is removed, demonstrating the effectiveness of retrieval-augmented pre-training in incorporating context for question answering.
The Significance of InstructRetro 48B
This study explores the extensive process of pretraining a GPT model, creating Retro 48B, instructing it to improve zero-shot question answering, and evaluating its performance in various tasks. The introduction of InstructRetro 48B, the largest retrieval-augmented language model, demonstrates the potential of larger models in natural language understanding.
Improved Performance in Zero-Shot Question Answering
Retro 48B, a language model pre-trained with retrieval, outperforms the original GPT model in perplexity. After instruction tuning, referred to as InstructRetro, it significantly enhances zero-shot question answering, with an average improvement of 7% on short-form and 10% on long-form tasks compared to its GPT counterpart. Surprisingly, the decoder of InstructRetro achieves comparable results, highlighting the effectiveness of retrieval-based pre-training in incorporating context for question answering.
Overall, InstructRetro 48B, the largest retrieval-augmented language model, significantly enhances zero-shot accuracy in various open-ended question answering tasks. Pretraining with retrieval using the Retro augmentation method improves perplexity, and the study suggests that continued pre-training with retrieval before instruction tuning can enhance GPT decoders in question answering. The study also highlights the potential of retrieval-augmented pretraining for challenging tasks, especially in long-form question answering.
Check out the paper for more details. Credit for this research goes to the researchers involved in the project. Don’t forget to join our ML SubReddit, Facebook community, Discord channel, and subscribe to our email newsletter for the latest AI research news and projects.
If you like our work, you will love our newsletter. Subscribe here.
We are also on WhatsApp. Join our AI channel here.