Tremendous advances have been made in language models recently, thanks to in-context learning (ICL). This process involves models being prompted with a few examples of input-label pairs before performing a task. Two factors that contribute to the success of models in ICL are their use of semantic prior knowledge from pre-training and their ability to learn input-label mappings from examples.
In our research, we wanted to understand how these factors interact with each other, especially in relation to the scale of language models. We studied two settings – flipped-label ICL and semantically-unrelated label ICL (SUL-ICL). In flipped-label ICL, labels are flipped to contradict the semantic priors and input-label mappings. In SUL-ICL, labels are replaced with words unrelated to the task. We found that larger models can override prior knowledge and learn with semantically-unrelated labels, while smaller models cannot.
To conduct our experiments, we used seven natural language processing tasks and tested five language model families. In flipped-label ICL, we observed that smaller models only experience a slight decrease in performance, while larger models show a significant decrease. This indicates that large models can override prior knowledge when presented with contradictory labels. In SUL-ICL, smaller models rely more on semantic priors and show a greater decrease in performance compared to large models. Additionally, larger models benefit more from additional examples than smaller models in the SUL-ICL setup.
We also looked at the impact of instruction tuning, a technique for improving model performance. We compared standard language models with instruction-tuned variants and found that instruction tuning strengthens the ability to learn input-label mappings but also increases reliance on semantic priors. Instruction-tuned models perform better with semantically-unrelated labels but struggle to override prior knowledge when presented with flipped labels.
In conclusion, our research highlights the importance of understanding how language models learn in-context. Large models have the ability to override prior knowledge and learn with semantically-unrelated labels. Instruction tuning improves the capacity to learn input-label mappings but increases reliance on semantic priors. These findings provide valuable insights into the capabilities and limitations of language models in different contexts.