论文标题
帕普里卡:私人在线虚假发现率控制
PAPRIKA: Private Online False Discovery Rate Control
论文作者
论文摘要
在假设检验中,当由于样本中的噪声而错误拒绝假设时,会发生错误的发现。当自适应测试多个假设时,由于执行更多测试,假发现的概率会增加。因此,错误发现率(FDR)控制的问题是找到一个测试多个假设的程序,这些假设解释了这一效果,以确定要拒绝的假设集。目的是最大程度地减少错误发现的数量(或部分),同时保持很高的真实正率(即正确的发现)。 在这项工作中,我们研究了样本差异隐私的约束,研究了多个假设检验中的错误发现率(FDR)控制。与以前朝这个方向的工作不同,我们专注于在线环境,这意味着在执行测试后必须立即做出关于每个假设的决定,而不是等待所有测试的输出,例如离线设置中的所有测试。我们根据非私人在线FDR控制的最先进结果提供新的私人算法。我们的算法具有强大的可证明保证,可以通过FDR和Power衡量的隐私和统计绩效。我们还提供了实验结果,以证明算法在各种数据环境中的功效。
In hypothesis testing, a false discovery occurs when a hypothesis is incorrectly rejected due to noise in the sample. When adaptively testing multiple hypotheses, the probability of a false discovery increases as more tests are performed. Thus the problem of False Discovery Rate (FDR) control is to find a procedure for testing multiple hypotheses that accounts for this effect in determining the set of hypotheses to reject. The goal is to minimize the number (or fraction) of false discoveries, while maintaining a high true positive rate (i.e., correct discoveries). In this work, we study False Discovery Rate (FDR) control in multiple hypothesis testing under the constraint of differential privacy for the sample. Unlike previous work in this direction, we focus on the online setting, meaning that a decision about each hypothesis must be made immediately after the test is performed, rather than waiting for the output of all tests as in the offline setting. We provide new private algorithms based on state-of-the-art results in non-private online FDR control. Our algorithms have strong provable guarantees for privacy and statistical performance as measured by FDR and power. We also provide experimental results to demonstrate the efficacy of our algorithms in a variety of data environments.