Introducing LoRA-Fine-Tuning-aware Quantization (LoftQ)
Pre-trained Language Models (PLMs) have revolutionized Natural Language Processing by excelling in tasks like Natural Language Understanding (NLU) and Natural Language Generation (NLG). These models have millions or even billions of parameters, which require significant computational power and memory. However, the research community recognizes the challenges posed by these computational and memory requirements.
In their paper, the authors present LoftQ, a new quantization framework designed specifically for pre-trained models that require quantization and LoRA fine-tuning. LoftQ combines low-rank approximation with quantization to approximate the original high-precision pre-trained weights.
The framework’s performance is demonstrated through experiments using two quantization methods: uniform quantization and NF4/NF2. Uniform quantization divides a continuous interval into categories and stores a local maximum absolute value for dequantization. NF4/NF2 quantization methods assume that high-precision values follow a Gaussian distribution and map them to discrete slots with equal probability.
The experiments show that LoftQ achieves compression ratios of 25-30% and 15-20% at the 4-bit and 2-bit levels, respectively, when applied to all models using NVIDIA A100 GPUs. The evaluation also includes various downstream tasks in NLU, question answering, summarization, and NLG. The results consistently demonstrate LoftQ’s superiority over QLoRA at all precision levels. For example, with 4-bit quantization, LoftQ achieves a 1.1 and 0.8 improvement in Rouge-1 for XSum and CNN/DailyMail, respectively.
As the field of NLP progresses, further innovations and optimizations are expected to bridge the gap between the immense potential of PLMs and their practical deployment. This will benefit a wide range of applications and users.
Please visit the paper for more details and credit to the researchers behind this project. And don’t forget to join our ML subreddit, Facebook community, Discord channel, and subscribe to our email newsletter for the latest AI research news, cool projects, and more.
If you enjoy our work, you’ll love our newsletter. Sign up now!
We’re also on WhatsApp. Join our AI Channel on WhatsApp for more updates.
By Janhavi Lande, an Engineering Physics graduate from IIT Guwahati, class of 2023, and an upcoming data scientist with a keen interest in ML/AI research. She enjoys traveling, reading, and writing poems.