DNN Pruning: An Efficient and Effective Approach to Model Size Reduction
DNN pruning is a widely used method to reduce the size of a model, improve inference speed, and minimize power consumption on DNN accelerators. However, existing approaches can be complex, expensive, and ineffective for different vision/language tasks and DNN architectures. They may also fail to meet structured pruning constraints. In this article, we introduce an innovative train-time pruning scheme called Parameter-free Differentiable Pruning (PDP), which delivers exceptional results in terms of model size, accuracy, and training cost.
What is PDP?
PDP is a pruning technique that generates soft pruning masks for model weights in a parameter-free manner, based on a dynamic function of weights during training. PDP is differentiable and offers a simple and efficient solution for various vision and natural language tasks. It achieves state-of-the-art results for random, structured, and channel pruning on different DNN architectures.
Outstanding Results Achieved
PDP has demonstrated impressive performance on various tasks. For instance, when applied to MobileNet-v1, PDP achieves 68.2% top-1 ImageNet1k accuracy with 86.6% sparsity. This accuracy is 1.7% higher than the accuracy achieved by state-of-the-art algorithms. Additionally, PDP achieves over 83.1% accuracy on Multi-Genre Natural Language Inference with 90% sparsity for BERT, outperforming existing techniques which only achieve 81.5% accuracy.
PDP’s effectiveness also extends to structured pruning. In the case of 1:4 structured pruning of ResNet18, PDP improves the top-1 ImageNet1K accuracy by more than 3.6% compared to the current best approach. Similarly, for channel pruning of ResNet50, PDP only reduces the top-1 ImageNet1K accuracy by 0.6%, a slight improvement over the state-of-the-art.
With its impressive results and broad applicability, PDP offers a promising solution for DNN model pruning. It effectively reduces model size, improves accuracy, and lowers training costs, making it an essential tool for AI practitioners and researchers.