MIT Researchers Develop Technique to Efficiently Train AI Models
OpenAI’s ChatGPT has impressive capabilities, including the ability to write poetry and debug code. However, training larger models like ChatGPT is time-consuming and expensive. MIT researchers, led by assistant professor Yoon Kim, have developed a method to leverage smaller language models and “grow” them into larger models while retaining their knowledge. This technique saves about 50% of the computational costs compared to training models from scratch and performs just as well as other methods. The researchers believe that this approach can help democratize AI technologies by making training faster and less expensive.
The Significance of Efficient AI Model Training
Training large AI models like ChatGPT requires significant computational resources and time. The process involves exposing the model to billions of examples and running powerful computers for days or weeks. This incurs high costs and contributes to carbon emissions. By reducing the time and expenses required for training, researchers can make advancements more quickly and minimize environmental impact. Moreover, this approach allows smaller research groups to utilize these massive models and drive new innovations.
Using Smaller Models to “Grow” Larger Ones
Kim and his team utilize a transformer neural network architecture, known for its superior performance as models scale up. To accelerate training, they employ a technique called model growth. By copying neurons or entire layers from a smaller model and stacking them, the size of the transformer can be increased. The researchers introduce a novel approach called a learned Linear Growth Operator (LiGO), which learns to expand the width and depth of the larger network using a linear mapping of the smaller model’s parameters.
LiGO breaks down the linear map into smaller parts, enabling efficient handling by a machine learning algorithm. Unlike previous methods, LiGO expands both the width and depth of the model simultaneously, further enhancing efficiency. Comparative experiments show that the LiGO technique outperforms training from scratch and other model-growth methods while delivering up to a 50% reduction in computational costs for both language and vision models.
Furthermore, the researchers found that LiGO can accelerate transformer training even without access to a pretrained model. This flexibility opens up opportunities for more diverse applications of their technique.
In the future, Kim and his team plan to apply LiGO to even larger models, further enhancing the efficiency of AI model training.