Large language models (LLMs) are making great advancements in natural language processing (NLP). Recent research has shown that LLMs have the ability to complete various tasks without fine-tuning by using prompts specifically designed for those tasks. However, LLMs still struggle with generating accurate information and domain-specific expertise.
To address this issue, researchers have proposed incorporating external knowledge sources like structured data (databases and knowledge graphs) into LLMs. However, structured data is organized in a way that LLMs are not familiar with, so they need assistance in understanding it. The researchers propose using specialized interfaces to manipulate the structured data, such as extracting columns from tables, to help LLMs locate the necessary information and limit their search area.
In their study, researchers from Renmin University of China and the University of Electronic Science and Technology of China focus on designing appropriate interfaces and using them in conjunction with LLMs for reasoning tasks. They introduce an Iterative Reading-then-Reasoning (IRR) method called StructGPT, which allows LLMs to make decisions based on evidence gathered from the interfaces. This method separates the reading and reasoning processes for LLMs, using the interfaces for data access and filtering, and relying on reasoning to determine the next action.
The researchers conducted experiments on various tasks, such as KG-based question answering, table-based question answering, and DB-based text-to-SQL, to evaluate the effectiveness of their approach. The results showed that their method significantly improved ChatGPT’s reasoning performance on structured data, even surpassing supervised-tuning approaches in some cases.
For example, their method increased Hits@1 on the KGQA challenge by 11.4% and improved performance on multi-hop KGQA datasets by up to 62.9% and 37.0%. In the TableQA challenge, their method increased denotation accuracy by 3% to 5% and improved accuracy in table fact verification by 4.2%. In the Text-to-SQL challenge, their method increased execution accuracy by about 4%.
The researchers have made the code for Spider and TabFact available to understand the framework of StructGPT, with plans to release the entire codebase. They encourage readers to check out the paper and GitHub link for more information.
Overall, the study shows the potential of incorporating structured data interfaces in LLMs to enhance their reasoning capabilities and improve their performance on structured data tasks. This research has important implications for the field of AI and NLP and opens up new possibilities for utilizing LLMs in real-world applications.