The Recent DeepMind Research on the Risks of Language Models
A DeepMind paper just released has identified a major risk in large language models – they could be leaking sensitive information about their training data. This could pose ethical and social risks that organizations working on these models need to address. Another recent paper shows similar privacy risks for image classification models. A fingerprint of each individual training image can be found embedded in the model parameters, posing a threat if exploited by malicious parties.
To mitigate these risks, privacy-enhancing technologies like differential privacy (DP) can be deployed at training time. However, they often result in a significant reduction in model performance. This work aims to make substantial progress towards unlocking high-accuracy training of image classification models under differential privacy.
Understanding Differential Privacy and Its Role in Protecting Individual Privacy
Differential privacy is a mathematical framework that aims to protect individual records in the course of statistical data analysis, including the training of machine learning models. It injects carefully calibrated noise during the computation of the desired statistic or model to protect individuals from inferences about the features that make them unique. DP algorithms give robust and rigorous privacy guarantees, and have become the gold standard adopted by both public and private organizations.
Challenges in Differential Privacy for Deep Learning and Proposed Solutions
However, prior works have found that the privacy protection provided by DP-SGD often comes at the cost of significantly less accurate models. This utility degradation becomes more severe on larger neural network models, hindering the widespread adoption of differential privacy in the machine learning community. This research investigates this phenomenon and proposes modifications to the training procedure and model architecture to improve the accuracy of DP training on standard image classification benchmarks.
Promising Results and Open-Sourcing of Implementation
The research demonstrates a significant improvement in the accuracy of DP-SGD, with an ~10% improvement on CIFAR-10 and a top-1 accuracy of 86.7% on ImageNet. These results have the potential to unlock practical applications of image classification models trained with formal privacy guarantees. The implementation is open-sourced to enable other researchers to verify the findings and build on them.
The findings have the potential to make practical DP training a reality for others interested in this area of research.
Download the JAX implementation on GitHub.