New Study: Transformer Models Can Be Conceptualized as Multi-state RNNs
A recent study by researchers from The Hebrew University of Jerusalem and FAIR, AI at Meta, has shown that transformer models, like those used for natural language processing (NLP), can be redefined as multi-state recurrent neural networks (MSRNN). This has significant implications for the development of AI and language models.
What the Study Found
The study compared transformers and RNNs, showing that decoder-only transformers can be conceptualized as infinite multi-state RNNs. Furthermore, the researchers found that pretrained transformers can be converted into finite multi-state RNNs by fixing the size of their hidden state. The proposed TOVA policy, a simple compression method for selecting tokens based on their attention scores, consistently outperforms existing compression policies in long-range tasks.
Implications for AI and Language Models
These findings shed light on the inner workings of transformers and their connections to RNNs. Additionally, they have practical value in reducing the LLM cache size by up to 88%, leading to reduced memory consumption during inference.
To read the full paper and learn more about this research, visit the provided link. For updates and more AI-related content, follow us on social media and join our newsletter for exclusive insights and events.