Understanding the Inner Workings of Large Language Models (LLMs)
Large language models (LLMs) have revolutionized various real-world fields and have shown remarkable skills like in-context learning and chain-of-thought reasoning. However, there are also significant risks associated with their development, including social biases, data leaks, and disinformation. Additionally, potent AI systems pose long-term dangers. The behavior and functioning of LLMs are influenced by their scale and fine-tuning.
To effectively navigate these dangers, gaining insight into how LLMs work is crucial. One approach is to reverse engineer the model’s circuitry. Mechanistic interpretability has revealed processes like induction heads and other mechanisms that enable the model to learn uninterpretable characteristics.
Another method is to focus on the input-output relationships of the model. By studying model samples and probabilities, researchers can explore phenomena directly. However, drawing strong conclusions based on these samples is challenging since different learning processes can lead to similar outcomes.
To address this challenge, influence functions are used. Influence functions help understand how the model would behave if a particular sequence were included in the training set. By identifying sequences with significant impact, researchers can distinguish between different explanations for the model’s output and gain insights into its generalization from training examples.
While influence functions have provided insights into small-scale neural networks, applying them to large models is difficult. One computational bottleneck is computing an inverse-Hessian-vector product (IHVP), which involves running an iterative linear system solver for thousands of steps. Another challenge is computing the gradients of all training instances independently for each influence query.
Researchers have made progress in scaling influence function calculations to big models. They have successfully applied influence functions to vision transformers with 300 million parameters and even investigated models with up to 52 billion parameters. Their strategy involves optimizing the training gradient computation and IHVP calculation, overcoming the previously mentioned bottlenecks.
Key findings from their research include:
1. The EK-FAC method is both faster and competes with the established LiSSA method in terms of influence estimation accuracy.
2. The influence distribution follows a power law, indicating that model behaviors are not solely based on a small number of sequences, but are distributed throughout many.
3. Larger models consistently generalize at a higher level of abstraction compared to smaller models, showing skills in tasks like role-playing, programming, mathematical reasoning, and cross-linguistic generalization.
4. Influence is evenly dispersed throughout the network’s tiers, but different layers exhibit distinct generalization patterns. Intermediate layers focus on abstract patterns, while upper and lower layers are closely related to tokens.
5. Influence functions show sensitivity to word order, with training sequences having a significant impact when words related to the prompt come before completion.
6. Role-playing behavior is influenced by examples or descriptions of similar behavior in the training set, indicating that imitation plays a major role.
Understanding the inner workings of LLMs is crucial for safe and effective integration with human preferences. Through reverse engineering and the use of influence functions, researchers are making significant progress in unraveling the mysteries of LLMs and shedding light on their capabilities and behavior.
[Include HTML subheadings here]
This research was conducted by a team of researchers from the University of Toronto and the Vector Institute. To stay updated on the latest AI research news and projects, join our ML SubReddit, Facebook Community, Discord Channel, and Email Newsletter.
Aneesh Tickoo, a consulting intern at MarktechPost and an undergraduate student at the Indian Institute of Technology (IIT), Bhilai, contributed to this article. Aneesh is passionate about machine learning and focuses on projects related to image processing. He enjoys collaborating on interesting projects and connecting with people in the field.
Build your personal brand with Taplio! 🚀 The 1st AI-powered tool to grow on LinkedIn
Use SQL to predict the future