The Evolution of Natural Language Processing with Large Language Models
Large Language Models (LLMs) like GPT and LLaMA have revolutionized the field of natural language processing (NLP). These powerful tools are becoming increasingly important for various tasks, leading to a high demand for custom LLMs. However, developing LLMs can be resource-intensive, posing a challenge for many individuals and organizations. To address this issue, researchers are exploring knowledge fusion as an alternative approach to building robust models while minimizing development costs. This technique involves combining multiple LLMs to take advantage of their strengths for different tasks.
The FUSELLM Method for Efficient Knowledge Fusion
Traditionally, integrating multiple models involved ensemble methods or direct merging of neural networks. While effective, these approaches often faced inefficiencies during inference or required uniform network architectures for merging. FUSELLM introduces a new method for knowledge fusion, using probability distribution matrices from multiple source LLMs to transfer collective knowledge to a target LLM through lightweight continual training. This innovative approach allows for the fusion of pre-trained LLMs with different architectures into a unified model.
Introducing FUSECHAT for Chat LLMs
Building upon the principles of FUSELLM, researchers have developed FUSECHAT, specifically designed for fusing chat LLMs with varying architectures and scales. FUSECHAT operates in two main stages: knowledge fusion of source LLMs with different structures and scales, and merging within the parameter space to incorporate collective knowledge from the source models. The method introduces VARM (Variation Ratio Merge), a novel technique for determining combining weights based on the variation ratio of parameter matrices before and after fine-tuning. This enables precise merging without the need for additional training.
Empirical testing of FUSECHAT using representative open-source chat LLMs has shown promising results. Performance evaluations on MT-Bench, a benchmark for multi-turn dialogue ability, demonstrate that FUSECHAT surpasses individual source LLMs and fine-tuned baselines across various scales. The VARM merging method, in particular, achieves superior performance, highlighting the efficacy of merging weights based on variation ratios. With its scalability and adaptability, FUSECHAT presents a compelling solution for integrating chat models within the dynamic landscape of open-source LLM development.