Large language models (LLMs) like GPT-3, PaLM, OPT, BLOOM, and GLM-130B have significantly advanced computer language capabilities. These models have greatly improved question answering, one of the most important language applications. Studies show that LLMs perform just as well as supervised models in closed-book QA and in-context learning QA, demonstrating their memorization capacity. However, LLMs have limitations and struggle with problems that require extensive knowledge. To overcome this, researchers have focused on enhancing LLMs with external knowledge, such as retrieval and online search.
WebGPT is a popular web-enhanced QA system that offers online browsing, detailed answers, and helpful references. However, it has not been widely adopted due to its reliance on expert annotations, lengthy training, and expensive resources. It also requires the system to interact with a web browser, which can be slow and inefficient. In this study, researchers from Tsinghua University, Beihang University, and Zhipu.AI introduce WebGLM, an affordable and effective web-enhanced QA system built on the GLM-10B model.
WebGLM uses innovative approaches, including an LLM-augmented Retriever, to improve performance. The system combines fine-grained LLM-driven retrieval with coarse-grained web search. This approach is inspired by the ability of LLMs like GPT-3 to incorporate references, and it can be used to enhance smaller retrievers. The system also includes a bootstrapped generator that uses LLM in-context learning to provide high-quality answers. Instead of relying on human experts, adequate citation-based filtering is used to train LLMs.
WebGLM also includes a scorer that learns from user preferences on online QA forums. This scorer helps understand the majority’s preferences when it comes to different responses. The quantitative tests and human evaluation of WebGLM show its efficiency and effectiveness. It outperforms WebGPT (175B) and WebGPT (13B) on Turing tests and is considered one of the best web-enhanced QA systems available, surpassing Perplexity.ai.
In conclusion, the researchers introduce WebGLM as a web-enhanced QA system that performs similarly to WebGPT (175B), outperforming WebGPT (13B) and Perplexity.ai. They address the limitations of WebGPT and propose new designs and strategies for WebGLM to achieve high accuracy in a cost-effective manner. They also define human evaluation metrics for web-enhanced QA systems and present extensive human evaluations and experiments. The code implementation of WebGLM is available on GitHub.