Large language models (LLMs) with billions of parameters have revolutionized AI. However, their high computation requirements for deployment on low-resource devices has been a challenge.
One study argues for bringing back ReLU activation in LLMs despite recent trends favoring alternative activation functions like GELU or SiLU. The researchers show that using ReLU has little impact on performance and convergence, but significantly cuts down on computation and weight transfer.
This reduction is especially useful during the memory-bound inference step. The study also explores sparsity patterns in ReLU-based LLMs, uncovering ways to reuse activated neurons for generating new tokens. They propose practical strategies to cut LLM inference computation by up to three times using ReLU activations, with minimal performance trade-offs.
Benefits of ReLU Activation in LLMs
The study demonstrates that using ReLU activation function in LLMs has minimal impact on performance and convergence while significantly reducing computation and weight transfer, especially during the memory-bound inference step.
Practical Strategies for Reducing LLM Inference Computation
The researchers propose practical strategies for substantially reducing LLM inference computation up to three times using ReLU activations, with minimal performance trade-offs, by reusing activated neurons for generating new tokens.