Fast and Scalable Neural Network Pruning with CHITA Algorithm
Neural networks have made significant progress in various applications such as language processing, mathematical reasoning, and visual recognition. However, these networks are often large and require a lot of computational resources, making it challenging to use them on devices with limited resources like wearables and smartphones.
To address this issue, researchers have developed pruning methods to remove unnecessary weights from pre-trained networks without affecting their performance. There are different approaches to pruning, including magnitude pruning and optimization-based pruning. While magnitude pruning removes weights with the smallest magnitude, optimization-based pruning takes into account the impact of weight removal on the loss function.
In a recent study titled “Fast as CHITA: Neural Network Pruning with Combinatorial Optimization,” researchers from MIT introduced CHITA, an optimization-based approach that outperforms existing pruning methods in terms of scalability and performance. CHITA leverages advances from high-dimensional statistics, combinatorial optimization, and neural network pruning to achieve faster and more accurate pruning.
One of CHITA’s key technical improvements is its efficient use of second-order information without explicitly computing or storing the Hessian matrix. By exploiting the low-rank structure of the empirical Fisher information matrix, CHITA can perform pruning without incurring the high computational cost associated with traditional methods. This allows for more scalability and faster pruning.
CHITA also utilizes a combinatorial optimization algorithm that takes into account the impact of pruning one weight on others. This avoids removing important weights mistakenly and ensures better overall performance.
The pruning process is formulated as a best-subset selection problem, where the goal is to find the subset of weights with the smallest loss among all possible pruning candidates. By approximating the loss with a quadratic function and using the empirical Fisher information matrix, CHITA solves this problem efficiently without explicitly computing the Hessian matrix.
To optimize the pruning process, CHITA uses an iterative hard thresholding algorithm combined with a line-search method to find the optimal learning rate. These techniques improve the convergence speed and overall efficiency of CHITA.
In experiments conducted on popular architectures like ResNet and MobileNet, CHITA demonstrated significant improvements in speed and accuracy compared to other pruning methods. It was up to 1000 times faster than state-of-the-art methods and improved accuracy by over 10% in many cases.
Overall, CHITA offers a fast and scalable solution for pruning pre-trained neural networks. Its optimization-based approach, efficient use of second-order information, and combinatorial optimization algorithm make it a valuable tool for resource-constrained environments. By reducing the computational requirements of neural networks, CHITA opens up new possibilities for using AI in wearable devices and smartphones.