Title: Symbol Tuning: Improving Language Models’ Ability to Reason and Learn
Introduction: The Significance of Symbol Tuning in Language Models
Language models are designed to understand and analyze natural language. However, they often struggle to reason and learn from input-label mappings in a given context. To address this issue, the Google AI team has developed a simple finetuning procedure called Symbol Tuning. This procedure significantly improves the language model’s ability to reason and learn from input-label mappings in context.
Significance of Symbol Tuning: Improved Performance on Unseen In-Context Tasks
By utilizing symbol tuning, the performance of baseline models on unseen in-context learning tasks can be greatly enhanced. These models rely on finetuned exemplars, where natural language labels are replaced with semantically unrelated labels. To define the task, multiple in-context exemplars are needed as a single exemplar is insufficient. On average, symbol tuning improves the performance of Flan-cont-PaLM-62B by 11.1% across eleven evaluation tasks.
Better Algorithmic Reasoning: Symbol Tuning and Natural Language Data
Symbol-tuned models solely utilize natural language data, resulting in improved performance in algorithmic reasoning tasks. In a set of list functional tasks, the model needs to identify the transformation function between input and output lists containing non-negative integers. The researchers employed simple Turing concepts, where the model uses binary string reasoning to map an input to output. Symbol tuning led to an average performance improvement of 18.2% for Flan-PaLM-8B, 11.1% for Flan-PaLM-62B, 15.5% for Flan-cont-PaLM-62B, and 3.6% for Flan-PaLM-540B across all tasks.
Enhanced Ability to Follow Flipped Labels: Symbol Tuning vs. Instruction Tuning
Symbol-tuned models outperform instruction-tuned models when it comes to following flipped labels presented in context. Instruction-tuned models struggle to flip predictions and follow flipped labels. In contrast, symbol tuning forces models to treat the presented label in context as an arbitrary symbol, reducing their reliance on prior knowledge that contradicts the flipped labels. Symbol tuning results in an average improvement of 26.5% for Flan-PaLM-8B, 33.7% for Flan-PaLM-62B, and 34.0% for Flan-PaLM-540B across all datasets.
Effectiveness of Symbol Tuning and Data Proportions
Symbol tuning doesn’t require extensive finetuning for models with small datasets. After an initial peak change in performance, the observed performance remains relatively constant. However, larger models may require a more diverse or larger set of symbol-tuning data. Interestingly, higher proportions of symbol-tuning data do not significantly affect the model’s performance. As long as non-trivial symbol-tuning data is used, the model can successfully generalize its ability to new tasks.
Symbol tuning is a powerful finetuning procedure that enhances language models’ ability to reason and learn from input-label mappings in context. By leveraging symbol tuning, models can achieve better performance on unseen in-context tasks, algorithmic reasoning tasks, and following flipped labels. The effectiveness of symbol tuning is not reliant on the proportions of symbol-tuning data used, but rather on the diversity and relevance of the data.