The Impact of Retrieval on Language Models in Long-Form Question Answering
Long-Form Question Answering (LFQA) systems aim to provide comprehensive responses to queries. These systems use large language models (LLMs) and retrieved documents to generate detailed answers in the form of paragraphs. Recent research has shown that retrieval can enhance the capabilities of LLMs in LFQA tasks. However, the specific effects of retrieval augmentation on the generation of answers are still not fully understood.
Researchers from the University of Texas at Austin conducted a study to investigate how retrieval influences the creation of answers in LFQA. They designed two research contexts in which they manipulated either the LLM or the evidence documents. To assess the quality of LFQA answers, the researchers looked at various indicators such as length and coherence. They also used human annotations to evaluate the attribution of the generated answers to the evidence documents.
The study found that retrieval augmentation significantly impacts the generation of answers by LLMs. Even when the retrieved documents are irrelevant, the length of the generated responses may change. Moreover, when important in-context evidence is provided, LLMs tend to produce more unexpected phrases. The study also revealed that different base LLMs can have contrasting effects with retrieval augmentation, even when using the same set of evidence documents.
Additionally, the research showed that the quality of attribution in LFQA can vary widely between base LLMs. The study also uncovered the patterns of attribution in the production of lengthy texts. The generated text tends to follow the sequence of the evidence documents, and the last sentence is less traceable than earlier sentences. These findings contribute to a better understanding of how LLMs leverage contextual evidence to answer complex questions.
The study highlights the significant impact of retrieval augmentation on the generation of answers in LFQA. It also provides insights into the attribution patterns and the use of contextual evidence by LLMs. These findings can guide future research in improving the performance and reliability of LFQA systems.
Check out the full research paper here. Credit for this research goes to the researchers involved in the project.