论文标题
私人加权抽样
Differentially Private Weighted Sampling
论文作者
论文摘要
通用数据集具有带有键的元素形式(例如,交易和产品),目标是对键和频率对的汇总形式进行分析。频率(一个函数)的加权键样本是一个高度用途的摘要,可提供一组稀疏的代表键,并支持查询统计数据的近似评估。我们提出了私人加权采样(PWS):一种方法,可确保元素级别的差异隐私,同时在可能的范围内保留相应的非私人加权样品的实用性。 PWS最大化键的报告概率和广泛统计家庭的估计质量。当未进行采样时,PWS对私人直方图的经过精心培训的特殊情况也可以改善最新技术。与先前的基线相比,我们在经验上表现出了显着的性能增长:普通Zipfian频率分布的密钥报告增加了20%-300%,$ \ times 2 $ - $ 8 $降低估计任务的频率。此外,不需要原始数据,将PWS应用于非私有样本的简单后处理。这允许与现有的非私人方案实现的无缝集成,并保留为资源受限设置(例如大量的分布式或流数据)设计的方案的效率。我们认为,由于实用性和性能,PW可能会成为需要隐私的应用中的一种选择方法。
Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampling (PWS): A method that ensures element-level differential privacy while retaining, to the extent possible, the utility of a respective non-private weighted sample. PWS maximizes the reporting probabilities of keys and estimation quality of a broad family of statistics. PWS improves over the state of the art also for the well-studied special case of private histograms, when no sampling is performed. We empirically demonstrate significant performance gains compared with prior baselines: 20%-300% increase in key reporting for common Zipfian frequency distributions and accuracy for $\times 2$-$ 8$ lower frequencies in estimation tasks. Moreover, PWS is applied as a simple post-processing of a non-private sample, without requiring the original data. This allows for seamless integration with existing implementations of non-private schemes and retaining the efficiency of schemes designed for resource-constrained settings such as massive distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired.