Home AI News Enhancing Transformer-based Language Models with External Memory for Universal Computation

Enhancing Transformer-based Language Models with External Memory for Universal Computation

Enhancing Transformer-based Language Models with External Memory for Universal Computation

Transformers like GPT-2 and GPT-3 have achieved impressive results, leading researchers to focus on large language models (LLMs). The success of ChatGPT has further increased interest in LLMs. In-context learning and chain-of-thought prompting have improved the accuracy of the models. However, current transformer-based LLMs have a limitation – they can only condition on a fixed input string length, which limits their computational capabilities. To overcome this limitation, researchers have explored the idea of adding an external feedback loop to LLMs, but its impact on expanding the models’ computations is still uncertain.

To address this, Google Brain and researchers from the University of Alberta collaborated on a project. They introduced an external read-write memory to an LLM to demonstrate that it can emulate any algorithm on any input. Their findings, summarized in the paper “Memory Augmented Large Language Models are Computationally Universal,” highlight the computational universality of an LLM enhanced with an associative read-write memory.

The Flan-U-PaLM 540B was the chosen LLM for the study. The researchers used a stored instruction computer to connect the LLM and associative memory, enabling interaction between outputs and input prompts in a loop. The external associative memory functions as a dictionary, with key-value pairs representing variable names/address locations and values. The language model and memory utilize regular expression matches for each parsing step.

By developing a unique “prompt program,” the researchers directed the system to simulate the execution of a universal Turing machine. The study examined prompt-result patterns and confirmed that the language model generated the correct output for each possible input string. Importantly, this research did not require additional training or modification of the language model’s pre-trained weights. Instead, it focused on constructing a stored instruction computer that could be programmed with specific prompts.

This study stands out from previous research on computational universality in models. Unlike other studies, the researchers demonstrated that external memory augmentation can unlock universal computational behavior using a fixed language model with fixed pre-trained weights. The findings indicate that large language models are already computationally universal, as long as they can access infinite external memory.

To learn more about this research, you can read the paper. Credit for this research goes to the researchers involved in the project. Don’t forget to join our Reddit Page, Discord Channel, and Email Newsletter for the latest AI research news and cool projects.

– Khushboo Gupta, Consulting Intern at MarktechPost

Source link


Please enter your comment!
Please enter your name here