Machine Unlearning Challenge: Protecting Privacy and Improving Models
Deep learning has made significant progress in various applications, from image generation to language models. However, using deep neural network models requires caution to mitigate risks such as biases and privacy concerns. Deleting data from databases is not enough to erase its influence on trained models. Researchers have discovered that it is possible to infer whether an example was used for training using membership inference attacks, even if the data is deleted. Machine unlearning is a subfield of machine learning that aims to remove the influence of specific training examples from a model.
Retraining the model without the forget set is a straightforward but computationally expensive solution. We are excited to announce the Machine Unlearning Challenge, organized in collaboration with academic and industrial researchers. The competition addresses a realistic scenario where a subset of training images must be forgotten to protect privacy. Participants will develop unlearning models and submissions will be scored based on forgetting quality and model utility.
Applications of machine unlearning go beyond privacy protection. It can be used to remove inaccurate or outdated information from models or to eliminate harmful or outlier data. Unlearning is related to other areas of machine learning such as differential privacy, lifelong learning, and fairness. Differential privacy ensures that no training example has a significant influence on the model. Lifelong learning aims to design models that can continuously learn while retaining previous knowledge. Unlearning can also help address unfair biases and treatment of different groups.
An unlearning algorithm takes a pre-trained model, the forget set, and the retain set as input, and produces an updated model. However, unlearning is a complex problem with conflicting objectives. Existing algorithms make different trade-offs between forgetting data, maintaining model utility, and efficiency. Evaluation metrics for unlearning algorithms have been inconsistent, hindering progress in the field. To address this, we are launching the Machine Unlearning Challenge to standardize evaluation and advance the state of the art.
The competition aims to identify the strengths and weaknesses of different unlearning algorithms through standardized evaluations. Participants will have access to a starting kit with a toy dataset to build and test their models. The scenario involves forgetting a subset of face images in an age predictor model to protect privacy. Real-face datasets will be used for evaluation. Submissions will be evaluated based on forgetting strength and model utility.
The Machine Unlearning Challenge will run on Kaggle from mid-July 2023 to mid-September 2023. We encourage researchers and developers to participate and contribute novel solutions to unlearning algorithms. Let’s work together to protect privacy, improve models, and advance the field of machine unlearning.