Formal Privacy for Functional Data with Gaussian Perturbations Motivated by the rapid rise in statistical tools in Functional Data Analysis, we consider the Gaussian mechanism for achieving differential privacy with parameter estimates taking values in a, potentially infinite-dimensional, separable Banach space. Using classic results from probability theory, we show how densities over function spaces can be utilized to achieve the desired differential privacy bounds. This extends prior results of Hall et al. (2013) to a much broader class of statistical estimates and summaries, including “path level” summaries, nonlinear functionals, and full function releases. By focusing on Banach spaces, we provide a deeper picture of the challenges for privacy with complex data, especially the role regularization plays in balancing utility and privacy. Using an application to penalized smoothing, we explicitly highlight this balance in the context of mean function estimation. Simulations and an application to diffusion tensor imaging are briefly presented, with extensive additions included in a supplement. Graphical-model based estimation and inference for differential privacy Many privacy mechanisms reveal high-level information about a data distribution through noisy measurements. It is common to use this information to estimate the answers to new queries. In this work, we provide an approach to solve this estimation problem efficiently using graphical models, which is particularly effective when the distribution is high-dimensional but the measurements are over low-dimensional marginals. We show that our approach is far more efficient than existing estimation techniques from the privacy literature and that it can improve the accuracy and scalability of many state-of-the-art mechanisms. An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule We present a provably optimal differentially private algorithm for the stochastic multi-arm bandit problem, as opposed to the private analogue of the UCB-algorithm (Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016) which doesn't meet the recently discovered lower-bound of Ω(Klog(n)ϵ)(Shariff and Sheffet, 2018). Our construction is based on a different algorithm, Successive Elimination (Even-Dar et al., 2002), that repeatedly pulls all remaining arms until an arm is found to be suboptimal and is then eliminated. In order to devise a private analogue of Successive Elimination we visit the problem of private \emph{stopping rule}, that takes as input a stream of i.i.d samples from an unknown distribution and returns a \emph{multiplicative} (1±α)-approximation of the distribution's mean, and prove the optimality of our private stopping rule. We then leverage on the private stopping rule algorithm and present the private Successive Elimination algorithm which meets both the non-private lower bound (Lai and Robbins, 1985) and the above-mentioned private lower bound. We also compare empirically the performance of our algorithm with the private UCB algorithm. Sublinear Space Private Algorithms Under the Sliding Window Model he Differential privacy overview of Apple states, Apple retains the collected data for a maximum of three months." Analysis of recent data is formalized by the {\em sliding window model}. This begs the question: what is the price of privacy in the {sliding window model}. In this paper, we study heavy hitters in the sliding window model with window size w. Previous works of~\citet{chan2012differentially} estimates heavy hitters using O(w)space and incur an error of order θw for a constant θ>0. In this paper, we give an efficient differentially private algorithm to estimate heavy hitters in the sliding window model with ˜O(w3/4)additive error and using ˜O(√w) space. Locally Private Bayesian Inference for Count Models We present a general method for privacy-preserving Bayesian inference in Poisson factorization, a broad class of models that includes some of the most widely used models in the social sciences. Our method satisfies limited precision local privacy, a generalization of local differential privacy, which we introduce to formulate privacy guarantees appropriate for sparse count data. We develop an MCMC algorithm that approximates the locally private posterior over model parameters given data that has been locally privatized by the geometric mechanism (Ghosh et al., 2012). Our solution is based on two insights: 1) a novel reinterpretation of the geometric mechanism in terms of the Skellam distribution (Skellam, 1946) and 2) a general theorem that relates the Skellam to the Bessel distribution (Yuan & Kalbfleisch, 2000). We demonstrate our method in two case studies on real-world email data in which we show that our method consistently outperforms the commonly-used \naive approach, obtaining higher quality topics in text and more accurate link prediction in networks. On some tasks, our privacy-preserving method even outperforms non-private inference which conditions on the true data. Low Latency Privacy Preserving Inference When applying machine learning to sensitive data, one has to balance between accuracy, information leakage, and computational-complexity. Recent studies combined Homomorphic Encryption with neural networks to make inferences while protecting against information leakage. However, these methods are limited by the width and depth of neural networks that can be used (and hence the accuracy) and exhibit high latency even for relatively simple networks. In this study we provide two solutions that address these limitations. In the first solution, we present more than 10x improvement in latency and enable inference on wider networks compared to prior attempts with the same level of security. The improved performance is achieved by novel methods to represent the data during the computation. In the second solution, we apply the method of transfer learning to provide private inference services using deep networks with latency lower than 0.2 seconds. We demonstrate the efficacy of our methods on several computer vision tasks. Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters We consider the problems of distribution estimation and frequency/heavy hitter estimation under local differential privacy (LDP), and communication constraints. While each constraint has been studied separately, optimal schemes for one are sub-optimal for the other. We provide a one-bit \eps-LDP scheme that requires no shared randomness and has the optimal performance. We also show that a recently proposed scheme (Acharya et al., 2018b) for \eps-LDP distribution estimation is also optimal for frequency estimation. Finally, we show that if we consider LDP schemes for heavy hitter estimation that do not use shared randomness then their communication budget must be w(1) bits. Poission Subsampled R\'enyi Differential Privacy We consider the problem of privacy-amplification by under the Renyi Differential Privacy framework. This is the main technique underlying the moments accountants (Abadi et al., 2016) for differentially private deep learning. Unlike previous attempts on this problem which deals with Sampling with Replacement, we consider the Poisson subsampling scheme which selects each data point independently with a coin toss. This allows us to significantly simplify and tighten the bounds for the RDP of subsampled mechanisms and derive numerically stable approximation schemes. In particular, for subsampled Gaussian mechanism and subsampled Laplace mechanism, we prove an analytical formula of their RDP that exactly matches the lower bound. The result is the first of its kind and we numerically demonstrate an order of magnitude improvement in the privacy-utility tradeoff. Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA The exponential mechanism is a fundamental tool of Differential Privacy (DP) due to its strong privacy guarantees and flexibility. We study its extension to settings with summaries based on infinite dimensional outputs such as with functional data analysis, shape analysis, and nonparametric statistics. We show that one can design the mechanism with respect to a specific base measure over the output space, such as a Guassian process. We provide a positive result that establishes a Central Limit Theorem for the exponential mechanism quite broadly. We also provide an apparent negative result, showing that the magnitude of the noise introduced for privacy is asymptotically non-negligible relative to the statistical estimation error. We develop an \ep-DP mechanism for functional principal component analysis, applicable in separable Hilbert spaces. We demonstrate its performance via simulations and applications to two datasets.

Watch SlidesLive on mobile devices

© SlidesLive Inc.