Large Language Models (LLMs) like GPT-3 and PaLM have shown impressive performance on natural language tasks, even without extensive training data. However, using LLMs for text ranking has been challenging. Existing approaches perform worse than trained baseline rankers, except for a new strategy relying on the expensive and inaccessible GPT-4 system. Researchers acknowledge the value of exploring LLMs for ranking tasks but emphasize the need for more accessible solutions. In this study, the researchers explain why LLMs struggle with ranking problems and propose a new approach called pairwise ranking prompting (PRP). PRP simplifies the task for LLMs and addresses the calibration issue by using a query and a pair of documents as the prompt. It achieves state-of-the-art ranking performance on benchmark datasets, outperforming previous methods. The researchers also highlight the additional advantages of PRP, such as support for scoring and generation in LLM APIs. Overall, this work contributes to the understanding and improvement of ranking tasks using LLMs.
The study demonstrates the effectiveness of pairwise ranking prompting with LLMs, using moderate-sized, open-sourced models. It achieves state-of-the-art performance and offers a simpler approach compared to existing methods that rely on larger, commercial models. The researchers also discuss efficiency enhancements and showcase good empirical performance. For more information, check out the Paper. Stay updated with the latest AI research news, projects, and more by joining our ML SubReddit, Discord Channel, and Email Newsletter. Contact us at Asif@marktechpost.com if you have any questions or concerns about the article.