Private Query Release Assisted by Public Data

Jul 12, 2020



We study the problem of differentially private query release assisted by public data. Given a class H of queries h:X →{-1, 1}, and data set of i.i.d. samples from an unknown distribution D, a query-release algorithm is required to output a data structure G: H → [-1, 1] such that for any query h∈ H, G(h) approximates E_x∼ D[h(x)] up to some error α. In this problem, the input data set consists of two types of samples: private and public. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of private and public sample complexities. First, we show that this task is achievable for any query class of finite VC-dimension using only (roughly) d/α public samples and √(p)d^3/2/α^2 private samples, where d is the VC-dimension of the class, and p is the dual VC-dimension. When there are no public samples, there are known examples of classes of VC-dimension one for which this task is impossible under differential privacy (e.g., the class of threshold functions over R). Moreover, our upper bound on the public sample complexity is non-trivial since, without private samples, it is known that this task is equivalent to uniform convergence over H which requires at least d/α^2 public samples. Next, we give lower bounds on private and public sample complexities with tight dependence on p and α. In particular, for the class of decision stumps, we give a lower bound of √(p)/α on the private sample complexity whenever the number of public samples is <1/α^2. Given our upper bound, this shows that the dependence on √(p) in the private sample complexity is necessary (in the non-trivial regime where the public samples are insufficient to solve the problem on its own). We also give a tight lower bound of 1/α on the public sample complexity for a broad family of query classes.



About ICML 2020

The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning. ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker