Fairness in Large Language Models (LLMs): The VTC Fair Scheduler
Fairness in Large Language Models (LLMs) is a significant concern in current research, and there is much to be explored about it. Though current LLM serving systems focus on enhancing performance, there is a need to address fairness among clients. A team of researchers from UC Berkeley, Stanford University, and Duke University has introduced a groundbreaking fair scheduler, called VTC, specifically designed for LLM serving. This approach functions at the level of individual tokens, providing a more precise and adaptable solution in contrast to conventional fairness methods. The proposed fair scheduler uses a dynamic definition of fairness that considers both performance and GPU resource consumption, which is then tailored based on characteristics such as input and output token counts.
Effectiveness of the VTC Fair Scheduler
The VTC fair scheduler has been demonstrated to be effective under various workloads through rigorous evaluations, including real-world scenarios. The scheduler shows its ability to adjust to different fairness criteria, ensuring equitable resource allocation, and has been compared with alternative scheduling methods, such as First Come, First Serve (FCFS), Request per Minute (RPM), and Least Counter First (LCF), with consistent results confirming its advantages.
Flexibility and Conclusion
The scheduler’s flexibility in accommodating various fairness criteria and successful implementation and validation in real-life situations make it offer a viable and efficient solution for ensuring equitable distribution of resources among clients in LLM serving systems. The research also has a Paper and Github available for further reference and validation.