论文标题
私人数据分析的忽略采样算法
Oblivious Sampling Algorithms for Private Data Analysis
论文作者
论文摘要
我们根据数据集中的样本进行的查询研究安全和隐私的数据分析。值得信赖的执行环境(TEE)可用于保护查询计算过程中数据的内容,同时支持TEES中的差异私有(DP)查询时,请在揭示查询输出时提供记录隐私。由于\ emph {隐私放大},对基于样本的查询的支持很有吸引力,因为并非所有数据集都用于回答查询,而仅用于回答一个小子集。但是,在证明强大的DP保证的同时提取数据样本并不是微不足道的,因为必须保留样本指数的保密。为此,我们设计了常见采样算法的有效的安全变体。在实验上,我们表明,对于MNIST和CIFAR-10的差异私有模型,经过改组和抽样训练的模型的准确性相同,而采样可提供比改组更强的隐私保证。
We study secure and privacy-preserving data analysis based on queries executed on samples from a dataset. Trusted execution environments (TEEs) can be used to protect the content of the data during query computation, while supporting differential-private (DP) queries in TEEs provides record privacy when query output is revealed. Support for sample-based queries is attractive due to \emph{privacy amplification} since not all dataset is used to answer a query but only a small subset. However, extracting data samples with TEEs while proving strong DP guarantees is not trivial as secrecy of sample indices has to be preserved. To this end, we design efficient secure variants of common sampling algorithms. Experimentally we show that accuracy of models trained with shuffling and sampling is the same for differentially private models for MNIST and CIFAR-10, while sampling provides stronger privacy guarantees than shuffling.