Corgan：相关捕获卷积生成对抗网络，用于生成合成医疗记录

论文标题

Corgan：相关捕获卷积生成对抗网络，用于生成合成医疗记录

CorGAN: Correlation-Capturing Convolutional Generative Adversarial Networks for Generating Synthetic Healthcare Records

论文作者

Torfi, Amirsina, Fox, Edward A.

论文摘要

深度学习模型表明在图像分类和语音处理等领域表现出了高质量的性能。但是，使用电子健康记录（EHR）数据创建深度学习模型，需要解决该领域研究人员独有的特定隐私挑战。此问题将注意力集中在确保隐私的同时生成现实的合成数据。在本文中，我们提出了一个名为“相关捕获生成对抗网络（Corgan）”的新型框架，以生成合成的医疗记录。在Corgan中，我们利用卷积神经网络通过组合卷积生成的对抗网络和卷积自动编码器来捕获数据表示空间中相邻的医疗特征之间的相关性。为了证明模型保真度，我们表明Corgan生成的综合数据具有类似于各种机器学习设置（例如分类和预测）的实际数据的综合数据。我们还提供了有关合成数据现实特征的统计分析的隐私评估和报告。这项工作的软件是开源的，可在以下网址找到：https：//github.com/astorfi/cor-gan。

Deep learning models have demonstrated high-quality performance in areas such as image classification and speech processing. However, creating a deep learning model using electronic health record (EHR) data, requires addressing particular privacy challenges that are unique to researchers in this domain. This matter focuses attention on generating realistic synthetic data while ensuring privacy. In this paper, we propose a novel framework called correlation-capturing Generative Adversarial Network (CorGAN), to generate synthetic healthcare records. In CorGAN we utilize Convolutional Neural Networks to capture the correlations between adjacent medical features in the data representation space by combining Convolutional Generative Adversarial Networks and Convolutional Autoencoders. To demonstrate the model fidelity, we show that CorGAN generates synthetic data with performance similar to that of real data in various Machine Learning settings such as classification and prediction. We also give a privacy assessment and report on statistical analysis regarding realistic characteristics of the synthetic data. The software of this work is open-source and is available at: https://github.com/astorfi/cor-gan.

下载PDF全文

下载文献需遵守相关版权规定

论文标题