Neural Networks: Optimizing Performance with Unconventional Building Blocks
Neural networks, a type of machine-learning model, are revolutionizing various tasks, from credit score predictions to disease diagnosis. However, the inner workings of these models remain a mystery. Are they truly optimal for specific tasks?
Researchers at MIT have conducted a groundbreaking analysis of neural networks and have proven that they can be designed to be “optimal” with the right architecture. While developers typically use conventional building blocks, this study reveals that unconventional building blocks can lead to even better performance.
In their paper published in the Proceedings of the National Academy of Sciences, the researchers introduce these optimal building blocks, called activation functions. These functions can enhance the accuracy of neural networks on any dataset, even as the networks become larger. Senior author Caroline Uhler, a professor in the Department of Electrical Engineering and Computer Science at MIT, explains that these new activation functions are simple yet effective, emphasizing the importance of theoretical proofs in pushing the boundaries of neural network design.
Contributing to this study are lead author Adityanarayanan Radhakrishnan, an EECS graduate student and Eric and Wendy Schmidt Center Fellow, and Mikhail Belkin, a professor at the Halicioğlu Data Science Institute at the University of California, San Diego.
Understanding Activation Functions
Neural networks mimic the human brain and process data through interconnected nodes called neurons. To train a network for a specific task, such as image classification, millions of examples from a dataset are shown to the network. The network performs complex calculations until it produces a single number that determines the classification.
Activation functions play a crucial role in helping neural networks learn complex patterns in the input data. These functions transform the output of one layer before it reaches the next layer. When developers build a neural network, they must choose an activation function, determine the width of the network (number of neurons in each layer), and decide its depth (number of layers).
The researchers found that when the depth of a network increases using standard activation functions, performance significantly deteriorates. However, they discovered that by using different activation functions, network performance improves with more data.
Optimal Classification Methods
The study focuses on infinitely deep and wide neural networks trained for classification tasks. After a detailed analysis, the researchers identified three methods through which networks can learn to classify inputs. The first method classifies inputs based on the majority of training data. The second method assigns a label to a new input based on the closest training data point. The third and optimal method classifies inputs by calculating a weighted average of similar training data points.
Surprisingly, the researchers found that no matter which activation function is chosen, it will fall into one of these three classification methods. They developed formulas that accurately determine which method a specific activation function will employ, showcasing a clear and concise picture.
Improved Performance and Future Prospects
Testing their theory on various classification benchmarks, the researchers observed improved performance in many instances. Developers can now use the formulas provided to select activation functions that enhance the classification performance of neural networks.
Moving forward, the researchers aim to apply their findings to scenarios with limited data and networks that are not infinitely wide or deep. They also plan to analyze situations where data lack labels. By creating theoretically grounded models, the team hopes to deploy neural networks more reliably in mission-critical environments.
This revolutionary work was supported by the National Science Foundation, Office of Naval Research, MIT-IBM Watson AI Lab, the Eric and Wendy Schmidt Center at the Broad Institute, and a Simons Investigator Award.