Adaptive Weight Decay for Enhanced AI Performance
Adaptive weight decay is a new approach in artificial intelligence that automatically adjusts the hyper-parameter for weight decay during each training iteration. This method can lead to significant improvements in adversarial robustness without needing extra data, making it an attractive option for AI development.
Changing the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss and the regularization loss can result in big improvements. For example, this simple modification can lead to a 20% relative robustness improvement for CIFAR-100 and a 10% relative robustness improvement on CIFAR-10, compared to the best tuned hyper-parameters of traditional weight decay.
Moreover, this method also has other benefits, such as being less sensitive to learning rate and resulting in smaller weight norms. These properties contribute to robustness to overfitting to label noise and pruning, making adaptive weight decay an exciting innovation in the field of AI.