The Importance of Differential Privacy in AI
Differential privacy (DP) is a mathematical definition of privacy that is crucial in protecting user data. It ensures that the probability of any specific output remains almost unchanged when a data point is added or removed, thus safeguarding individual data points. It has seen significant advancements in foundational research and adoption, including the Privacy Sandbox and Google Open Source Library.
The Challenge of Composition in Differential Privacy
In differential privacy, privacy loss increases when multiple computations are performed on the same dataset. This is known as the cost of composition. Privacy loss grows with the square root of the number of computations, which means that stricter privacy guarantees are needed for each step to meet overall privacy goals. However, this often leads to a loss in utility.
The Reorder-Slice-Compute (RSC) Paradigm
To address the trade-off between privacy and utility, the Reorder-Slice-Compute (RSC) paradigm is introduced. It allows for the adaptive selection of slices in a way that doesn’t compromise privacy guarantees. When each step operates on a disjoint part of the dataset, the privacy guarantees remain intact even with multiple computations. However, adaptive slice selection can increase composition costs if a single data point affects multiple slices.
Improving Privacy and Utility with RSC
The research paper “Õptimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization” presents the RSC paradigm as a solution to the composition problem. It shows that DP algorithms for various aggregation and learning tasks can be expressed in the RSC paradigm, resulting in improved utility. The RSC algorithm consists of selecting an ordering over data points, a slice size, and a DP algorithm, applying the algorithm to the slice, and outputting the result.
Tighter Privacy Analysis for Better Utility
The privacy loss of an RSC algorithm typically deteriorates with the number of steps according to composition theorems. However, a novel analysis eliminates the dependence on the number of steps, resulting in privacy guarantees close to that of a single step. This tighter analysis improves the utility of DP algorithms.
Applications in Aggregation Tasks
The RSC paradigm is applied to common aggregation tasks to demonstrate its effectiveness. The private interval point task aims to find a point within the interval of a dataset. By applying the RSC algorithm, the input size required is significantly reduced to an order of the logarithm of the domain size. This highlights the difficulty of private solutions for aggregation tasks.
The private approximate median task seeks to find a point that falls between the ⅓ and ⅔ quantiles of a dataset. By computing an interval point of the smallest and largest points, an approximate median can be obtained. This method requires a dataset size of at least three times the number of intervals.
Private learning of axis-aligned rectangles is another task where the RSC paradigm can be applied. The goal is to learn the values for the axes that define a rectangle, distinguishing positive and negative labels. By utilizing the RSC algorithm, private learning can be achieved with efficient input sizes.
In conclusion, the Reorder-Slice-Compute (RSC) paradigm offers a promising solution to the composition problem in differential privacy. It allows for adaptive slice selection without compromising privacy, resulting in improved utility for AI algorithms. By applying RSC to various aggregation and learning tasks, significant advancements in privacy-preserving techniques can be made.