Title: Symbol Tuning: Enhancing In-Context Learning for Language Models
The ability to learn new tasks with minimal examples is a key aspect of human intelligence. Scaling up language models has revolutionized machine learning by enabling them to perform complex reasoning tasks through in-context learning. However, these models still struggle with prompt sensitivity, requiring extensive prompt engineering and exhibiting unexpected behaviors. At Google Research, we have developed a simple yet effective technique called symbol tuning that improves in-context learning by emphasizing input-label mappings. In this article, we will explore the significance of symbol tuning and its impact on language models’ performance.
Symbol Tuning for Enhanced Learning:
Symbol tuning is a fine-tuning procedure that focuses on tasks where natural language labels are replaced by arbitrary symbols. By removing instructions and replacing labels, the model is forced to reason over the in-context examples, enhancing its ability to perform tasks reliant on reasoning between examples and labels. Symbol tuning has shown considerable benefits across various tasks, including better performance on unknown in-context learning tasks and increased robustness to underspecified prompts.
Instruction tuning is a commonly used fine-tuning method, but it has limitations. Models trained with instructions and natural language labels tend to ignore the examples provided, as the task is redundantly defined. Symbol tuning addresses this issue by removing instructions and replacing labels with unrelated symbols, making the task unclear without considering the in-context examples. This motivates the model to reason over the examples and improve its performance on tasks that require reasoning between inputs and labels.
For our experiments, we selected 22 publicly available natural language processing datasets and remapped their labels to random symbols from three categories: integers, character combinations, and words. We applied the symbol-tuning procedure to Flan-PaLM models of varying sizes. The models were fine-tuned on approximately 30K arbitrary symbols, with the remaining symbols reserved for evaluation.
To evaluate the impact of symbol tuning, we chose 11 NLP datasets that were not used during fine-tuning. These datasets enabled us to assess the models’ ability to perform unseen tasks successfully. Symbol-tuned models consistently outperformed baselines, particularly in settings where relevant natural language labels were unavailable, highlighting the effectiveness of symbol tuning in improving in-context learning.
Enhanced Algorithmic Reasoning:
We also tested symbol tuning on algorithmic reasoning tasks from BIG-Bench, which involve list functions and simple turing concepts. Symbol tuning resulted in significant performance improvements across both task categories, illustrating the model’s enhanced ability to learn in-context. Notably, the symbol-tuned Flan-cont-PaLM-62B model achieved comparable performance to the larger Flan-PaLM-540B model, demonstrating substantial inference compute savings.
Flipped Labels Experiment:
In the flipped-label experiment, we studied whether models could override prior knowledge when presented with flipped labels. Symbol-tuned models consistently outperformed instruction-tuned models, demonstrating their superior ability to follow flipped labels in an in-context setting. The average improvement in performance ranged from 26.5% to 34.0%, depending on the model size.
Symbol tuning offers a valuable approach to improve in-context learning in language models. By focusing on tasks where natural language labels are replaced with arbitrary symbols, symbol tuning enhances a model’s ability to reason between in-context examples and labels. Our experiments have shown significant performance improvements across various tasks, including algorithmic reasoning and handling flipped labels. Symbol tuning holds promise for advancing language models and their applications in machine learning.