Introduction: Scaling Rule of Language Models
The scaling rule of language models has revolutionized the field of artificial intelligence (AI). These models, when trained on massive amounts of textual data, have shown remarkable superiority and have gained new capabilities. However, despite their strength and rapid development, these large-scale language models still have room for improvement to be more suitable for real-world applications. Fortunately, the open-source community has made significant contributions by providing robust and accessible language models like BLOOM, LLaMA, FlanT5, and AlexaTM.
Open-source Language Models
Many big language models have been made available by the open-source community, such as Chinese-LLaMA, MOSS, Huatuo, Luotuo, and Phoenix. These models can be used by researchers and developers to enhance their work. While strong general language models and decoder-only variants are readily accessible, the encoder-decoder framework, which is highly effective for various tasks like language comprehension and question-answering, is still relatively unexplored. To address this gap, researchers from Soochow University have developed an open-sourced 15B bilingual asymmetric seq2seq model called OpenBA.
The OpenBA Model
The OpenBA model is designed to improve the generation capability by using a shallow-encoder deep-decoder structure. This differs from other models like Flan-T5 and AlexaTM, which have either a balanced or deep-encoder shallow-decoder structure. The training procedure of OpenBA consists of three stages: UL2 pre-training, length-adaptation, and Flan training. The researchers have also applied enhancement tactics to the model architecture and training process to enhance its capacity, stability, and effectiveness.
In various tests and benchmarks, including zero-shot, few-shot, held-in, and held-out settings, the OpenBA model has outperformed many typical models, such as LLaMA-70B, BLOOM-176B, ChatGLM-6B, and C-Eval. Interestingly, despite achieving superior performance, the OpenBA-15B model consumes significantly less energy compared to the LLaMA-7B model.
The OpenBA model developed by researchers from Soochow University is a significant contribution to the field of language modeling. It fills the gap in the encoder-decoder framework and has shown impressive performance in various tasks and benchmarks. The availability of implementation-related information and open access to data and model checkpoints further enhances the accessibility and usability of the OpenBA model. The researchers encourage feedback and collaboration from the open-source community to continue improving and implementing the OpenBA paradigm.