论文标题
捉迷藏的隐私挑战
Hide-and-Seek Privacy Challenge
论文作者
论文摘要
临床时间序列设置对数据建模和共享构成了独特的挑战组合。由于临床时间序列的高维度,很难使用常见的去识别技术来实现足够的识别,以保留数据实用性,同时保留数据实用性。解决此问题的一种创新方法是合成数据生成。从技术的角度来看,时间序列数据的良好生成模型应保留时间动力学,从某种意义上说,新序列尊重跨时间的高维变量之间的原始关系。从隐私的角度来看,该模型应通过限制对会员推理攻击的脆弱性来防止患者重新识别。神经2020年《捉迷藏的隐私挑战》是一项新颖的两次跟踪竞赛,同时加速了解决这两个问题的进展。以我们的正面格式,合成数据生成轨迹(即“ Hiders”)和患者重新识别轨道(即“寻求者”)的参与者通过新的,高质量的重症监护时间序列数据集直接互相对抗。最终,我们试图促进(1)(1)在富裕性和预测性方面具有临床意义的密集和高维时时间数据流以及(2)能够根据患者重新识别的具体概念将成员隐私风险最小化。
The clinical time-series setting poses a unique combination of challenges to data modeling and sharing. Due to the high dimensionality of clinical time series, adequate de-identification to preserve privacy while retaining data utility is difficult to achieve using common de-identification techniques. An innovative approach to this problem is synthetic data generation. From a technical perspective, a good generative model for time-series data should preserve temporal dynamics, in the sense that new sequences respect the original relationships between high-dimensional variables across time. From the privacy perspective, the model should prevent patient re-identification by limiting vulnerability to membership inference attacks. The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to simultaneously accelerate progress in tackling both problems. In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset: the AmsterdamUMCdb dataset. Ultimately, we seek to advance generative techniques for dense and high-dimensional temporal data streams that are (1) clinically meaningful in terms of fidelity and predictivity, as well as (2) capable of minimizing membership privacy risks in terms of the concrete notion of patient re-identification.