Addressing Anonymization in IIoT Data: A Novel Approach Using GAN and DP
Anonymization is a significant concern in handling Industrial Internet of Things (IIoT) data. Machine Learning (ML) applications need access to decrypted data for efficient task performance. However, this exposes sensitive information to third parties involved in data processing, leading to privacy and information leakage risks. Consequently, companies hesitate to share their IIoT data with third parties.
Challenges in Privacy Preservation:
Current methods for addressing anonymization, such as encryption and cryptographic techniques, have limitations in terms of computational costs, explainability of ML models, and vulnerability to cyber-attacks. Furthermore, these methods often compromise accuracy for privacy, hindering effective and efficient data privacy preservation in IIoT.
The Proposed Method:
A research team from Kadir Has University in Turkey proposed a novel method that combines Generative Adversarial Networks (GAN) and Differential Privacy (DP) to preserve sensitive data in IIoT operations. This hybrid approach aims to achieve privacy preservation with minimal accuracy loss and low additional computational costs. GAN generates synthetic copies of sensitive data, while DP introduces random noise and parameters to maintain privacy. The method is tested using publicly available datasets and a realistic IIoT dataset from a confectionery production process.
Features of the Proposed Approach:
The proposed approach involves two main components: GAN and DP.
1. GAN: Conditional Tabular GAN (CTGAN) is used to create a synthetic copy (XG) of the original dataset (XO). GAN learns the data distribution and generates synthetic data with similar statistics to the original.
2. DP: Random noise from a Laplace distribution is added to sensitive features in the data to enhance privacy. This technique preserves privacy while maintaining the overall probability distribution of the data.
Implementation of the Method:
The method involves creating a synthetic dataset with GAN, replacing sensitive features, and applying differential privacy through random noise addition. The resulting dataset is privacy-preserving and suitable for machine learning analysis without compromising sensitive information. The algorithm complexity depends on the number of sensitive features and the dataset size. Overall, the proposed method ensures privacy protection for IIoT data.
Evaluation and Results:
Experiments were conducted using four SCADA datasets, including wind turbine, steam production, energy efficiency, and synchronous motors. The proposed hybrid approach using CTGAN and DP outperformed other methods in terms of accuracy and privacy preservation. The evaluation criteria included measuring accuracy using the R-squared metric and privacy preservation using six privacy metrics. The method also demonstrated its ability to protect hidden sensitive features in the data.
The proposed hybrid approach combining GAN and DP offers a promising solution for addressing the anonymization problem in IIoT data. It enables the creation of a privacy-preserving synthetic dataset while minimizing accuracy loss and computational costs. This method provides effective privacy preservation for IIoT environments.