Focused Transformer (FOT): Extending Context Length in Language Models with Memory

AI News

Focused Transformer (FOT): Extending Context Length in Language Models with Memory

Jimmy W.

July 10, 2023

Focused Transformer (FOT): Extending Context Length in Language Models with Memory

Significant advancements have been made by researchers in various fields using language models. However, incorporating new knowledge into these models is a challenge. The common practice of fine-tuning is resource-intensive and complex to manage, and it doesn’t always provide a straightforward method for incorporating new knowledge. To address this, researchers propose a promising alternative called Focused Transformer (FOT).

The FOT technique aims to overcome the challenge of limited context length in language models. As the number of documents increases, the ratio of relevant to irrelevant tokens diminishes, resulting in overlaps between keys related to irrelevant and relevant values. This is known as the distraction issue. The FOT allows a subset of attention layers to access an external memory of (key, value) pairs using the k-nearest neighbors (kNN) algorithm. This mechanism effectively extends the context length and helps address the distraction issue.

During training, the Focused Transformer draws from contrastive learning. The memory attention layers are exposed to both relevant and irrelevant keys, resembling negative samples from unrelated documents. This approach encourages the model to differentiate between keys connected to semantically diverse values, enhancing their structure.

The researchers introduce LONGLLAMAs, which are fine-tuned OpenLLaMA models with FOT. This method demonstrates that it does not require long context during training and can be applied to existing models. LONGLLAMAs significantly improve tasks requiring long-context modeling, such as passkey retrieval.

In summary, the Focused Transformer (FOT) technique addresses the distraction issue and allows for context length extension in language models. Training the model to differentiate between relevant and irrelevant keys enhances the structure and significantly improves tasks requiring long-context modeling. The FOT method can be applied to existing models without architectural modifications, making it a cost-effective solution for augmenting models with memory.

Check out the Paper and GitHub link for more information. Don’t forget to join our ML SubReddit, Discord Channel, and Email Newsletter for the latest AI research news and projects. If you have any questions or missed anything, feel free to email us at Asif@marktechpost.com.

Explore the AI Tools Club for hundreds of AI tools to enhance your work.

Source link

LEAVE A REPLY Cancel reply