论文标题
从发行版中差异化抽样
Differentially Private Sampling from Distributions
论文作者
论文摘要
我们开始对分布的私人抽样进行调查。给定一个来自未知分布$ p $的$ n $独立观测值的数据集,采样算法必须从分布中输出单个观察值,该分布在总变化距离至$ p $的同时,同时满足差异性隐私。抽样摘要生成少量现实的数据的目标。我们为此任务所需的三个自然分布家庭所需的数据集大小提供紧密的上限和下限:$ \ {1,\ ldots,k \} $上的任意分布,$ \ {0,1 \}^d $上的任意产品分布,以及$ \ {0,1 \}^D $的$ \ {0,1 \}^d $的产品分布。政权,私人抽样需要比非私人地学习$ p $的描述渐近地观察到的观察值少。但是,在其他制度中,私人抽样被证明与私人学习一样困难。值得注意的是,对于某些类别的分布,与非私人学习相比,私人学习所需的观察次数的开销完全被私人抽样所需的观察值完全捕获。
We initiate an investigation of private sampling from distributions. Given a dataset with $n$ independent observations from an unknown distribution $P$, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to $P$ while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data. We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on $\{1,\ldots ,k\}$, arbitrary product distributions on $\{0,1\}^d$, and product distributions on $\{0,1\}^d$ with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of $P$ nonprivately; in other regimes, however, private sampling proves to be as difficult as private learning. Notably, for some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.