Applying Differential Privacy to Learn about Iconic Scenes in Photos
This article discusses how differential privacy is used in Photos to gather information about the types of photos people take at popular locations without compromising their personal data. This approach is integral to various features in Photos, such as selecting key photos for Memories and locations in Places.
Significance of Iconic Scenes in Learning about User Photos
The Photos app uses machine learning to identify significant people, places, and events based on the user’s photo library. It then creates Memories, which are curated collections of photos and videos set to music. The key photo for a Memory is influenced by the popularity of iconic scenes learned from iOS users using differential privacy.
Three Key Aspects of Machine Learning Research
The research behind this feature focuses on three important aspects:
1. Accommodating data changes: As iconic scenes can change over time or depending on the season, the machine learning model needs to adapt and incorporate new categories of photos. This ensures that Memories remains engaging for users regardless of their location.
2. Balancing local and central differential privacy: Initially, local differential privacy was used to protect user data. However, this approach had limitations in terms of noise addition and signal detection. By employing a combination of local noise addition and secure aggregation, better transparency, verifiability, and utility of learned histograms were achieved.
3. Accounting for non-uniform data density: Data gathered worldwide is not uniformly distributed, making privacy challenges different for each region. In high-density areas, precise statistics with privacy assurances are easier to obtain. However, in low-density areas, obtaining meaningful signals without compromising user privacy is more challenging.
Balancing Privacy with Utility
To address these concerns, local noise addition and secure aggregation techniques were combined. A Photos Memories use case example explains how this works. Each one-hot vector, representing a location-category pair, is encoded and then randomly flipped to introduce local differential privacy assurance. These vectors are split into shares, encrypted, and uploaded to the server. The server aggregates the shares without revealing individual vectors, ensuring privacy.
The aggregates are then decoded, and the location and photo categories are visualized on a map. This approach has enabled the collection of frequencies for millions of location-category pairs, which power ML selections of Memories key photos and ranking of photos and locations in Places Map.
Privacy Assurance and Protection against Malicious Updates
To protect against malicious users submitting false data, Prio validation is used. The learned histograms satisfy a strong privacy assurance (epsilon=1, delta=1.5e-7), and additional noise is added in the secure aggregation protocol to maintain privacy even for smaller populations.
Privacy-Preserving ML Research for User Experiences
Privacy-preserving machine learning research has greatly enhanced user experiences in Photos. Differential privacy ensures that users’ personal data remains secure while still allowing for useful and personalized features in the app. This approach continues to evolve and improve to meet the needs of users worldwide.