Introducing More Efficient and Compact Large Language Models (LLMs)
Large Language Models (LLMs) have gained popularity for their exceptional abilities in various natural language tasks. However, training these models requires massive computational resources, which is a major drawback. To address this issue, researchers have developed more compact and effective LLMs, such as LLaMA, MPT, and Falcon. These medium-sized models provide efficient inference and fine-tuning for different use cases.
The Effectiveness of Structured Pruning
In a recent study, researchers explored the usefulness of structured pruning as a technique to reduce the size of larger pre-trained models into smaller LLMs. This method involves two key strategies:
1. Targeted Structured Pruning: This technique systematically eliminates unnecessary layers, heads, intermediate, and hidden dimensions from a larger language model to trim it down to a target configuration. By doing this, the model’s coherence and functionality are maintained, optimizing it without compromising vital language comprehension abilities.
2. Dynamic Batch Loading: This method adjusts the composition of training data within each batch based on the changing loss levels in different domains. It allows the model to focus more on tasks or domains where it is underperforming, dynamically modifying the data samples used in each batch. This adaptive approach improves overall efficiency.
The Success of Sheared-LLaMA Models
The researchers applied the structured pruning technique to create two smaller LLMs called Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B from the pruning of an LLaMA2-7B model. Surprisingly, these trimmed models only required 5% of the pre-training budget, or 50 billion tokens, from the training set.
Despite their smaller size, Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B outperformed other well-known LLMs of comparable scales, such as Pythia, INCITE, and OpenLLaMA, in various downstream tasks like open-ended generation, reading comprehension, common sense understanding, and world knowledge.
Future Implications and Generalizability
While the study focused on models with a maximum of 7 billion parameters, the LLM-shearing technique has great potential to be applied to large language models of any size in future investigations. The results demonstrate how smaller but powerful LLMs can be developed more effectively and economically.
In Conclusion
LLM shearing, through dynamic batch loading and targeted structured pruning, offers a comprehensive approach to reduce the size of LLMs. The success of Sheared-LLaMA models in various downstream tasks proves the effectiveness of this method. It can be applied to different model sizes, opening up possibilities for more efficient and compact LLM development. For more details, you can check out the Paper, Github, and Project. All credit goes to the researchers involved in this project.
Stay Updated with the Latest AI Research
Don’t miss out on the latest AI research news, cool projects, and more in the field of machine learning. Join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter. If you enjoy our work, you’ll love our newsletter. Also, don’t forget to watch AI research updates on our YouTube channel.