私人合成数据发布的新的Oracle效率算法

论文标题

私人合成数据发布的新的Oracle效率算法

New Oracle-Efficient Algorithms for Private Synthetic Data Release

论文作者

Vietri, Giuseppe, Tian, Grace, Bun, Mark, Steinke, Thomas, Wu, Zhiwei Steven

论文摘要

我们提出了三种用于构建差异私有合成数据的新算法 - 敏感数据集的消毒版本，该版本大致保留了大量统计查询的答案。所有三种算法都是\ emph {oracle-效率}的意义，即当它们可以访问优化甲骨文时，它们在计算上是有效的。可以使用许多现有的（非私人）优化工具（例如复杂的整数程序求解器）实现此类甲骨文。尽管合成数据的准确性取决于Oracle的优化性能，但即使在最坏的情况下，该算法也满足差异隐私。对于所有三种算法，我们为准确性和隐私提供理论保证。通过经验评估，我们证明了我们的方法随数据的维度和查询数量而良好。与最先进的方法相比，高维矩阵机制\ cite {McKennamhm18}，我们的算法在较大的工作负载和高隐私权方面提供了更好的准确性（对应于低隐私损失$ \ varepsilon $）。

We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle's optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism \cite{McKennaMHM18}, our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\varepsilon$).

下载PDF全文

下载文献需遵守相关版权规定

论文标题