论文标题

识别和正式隐私保证

Identification and Formal Privacy Guarantees

论文作者

Komarova, Tatiana, Nekipelov, Denis

论文摘要

经验经济研究至关重要地依赖于高度敏感的单个数据集。同时,提高公共个人级别数据的可用性使对手有可能在敏感的研究数据集中潜在地识别匿名记录。最常见的对个人不披露保证的正式定义称为差异隐私。它通过允许他们向数据发出查询来限制研究人员与数据的相互作用。然后,差异隐私机制将查询的实际结果替换为随机结果。 差异隐私对经验经济模型的识别以及非线性经验计量经济学模型中估计量的影响尚未得到充分研究。由于隐私保护机制是固有的有限样本程序,因此我们将利益参数的可识别性概念定义为在差异隐私下,作为实验极限的属性。它自然而然地以随机集理论的概念为特征。 我们表明,由于参数既不是点也不是部分识别的,因此对具有差异性隐私的推断,回归不连续设计的特定实例可能是有问题的。一组差异私有估计器薄弱地收敛到随机集合。我们的分析表明,许多其他依赖滋扰参数的估计器可能具有相似的属性,并且需要差异隐私。我们表明,如果目标参数可以确定性地位于随机集中,则可以进行识别。在这种情况下,完全探索了差分私有估计器的弱限制的随机集,可以使数据策展人在概率上选择差异私有估计器的一系列实例。

Empirical economic research crucially relies on highly sensitive individual datasets. At the same time, increasing availability of public individual-level data makes it possible for adversaries to potentially de-identify anonymized records in sensitive research datasets. Most commonly accepted formal definition of an individual non-disclosure guarantee is referred to as differential privacy. It restricts the interaction of researchers with the data by allowing them to issue queries to the data. The differential privacy mechanism then replaces the actual outcome of the query with a randomised outcome. The impact of differential privacy on the identification of empirical economic models and on the performance of estimators in nonlinear empirical Econometric models has not been sufficiently studied. Since privacy protection mechanisms are inherently finite-sample procedures, we define the notion of identifiability of the parameter of interest under differential privacy as a property of the limit of experiments. It is naturally characterized by the concepts from the random sets theory. We show that particular instances of regression discontinuity design may be problematic for inference with differential privacy as parameters turn out to be neither point nor partially identified. The set of differentially private estimators converges weakly to a random set. Our analysis suggests that many other estimators that rely on nuisance parameters may have similar properties with the requirement of differential privacy. We show that identification becomes possible if the target parameter can be deterministically located within the random set. In that case, a full exploration of the random set of the weak limits of differentially private estimators can allow the data curator to select a sequence of instances of differentially private estimators converging to the target parameter in probability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源