使用Wasserstein生成对抗插补网络进行重建的缺失功能

论文标题

使用Wasserstein生成对抗插补网络进行重建的缺失功能

Missing Features Reconstruction Using a Wasserstein Generative Adversarial Imputation Network

论文作者

Friedjungová, Magda, Vašata, Daniel, Balatsko, Maksym, Jiřina, Marcel

论文摘要

缺少数据是最常见的预处理问题之一。在本文中，我们通过实验研究生成和非生成模型进行特征重建。具有任意条件（VAEAC）和生成对抗归档网络（GAEN）的变异自动编码器作为生成模型的代表研究，而DeNoising AutoCododer（DAE）代表非生成模型。将模型的性能与传统方法K-nearest邻居（K-NN）和链式方程式（小鼠）进行多个插补。此外，我们将WGAN引入Wgain，作为沃斯坦（Wasserstein）的增长修改，当失踪程度小于或等于30％时，事实证明，这是最佳的插补模型。在现实世界和具有连续特征的人工数据集上进行了实验，其中不同百分比的功能从10％到50％不等。通过测量先前在未腐败数据集上训练的分类模型的准确性来完成算法的评估。结果表明，无论条件如何，增益，尤其是WGAN是最佳的墨西哥。通常，它们的表现跑赢大小鼠，K-NN，DAE和VAEAC。

Missing data is one of the most common preprocessing problems. In this paper, we experimentally research the use of generative and non-generative models for feature reconstruction. Variational Autoencoder with Arbitrary Conditioning (VAEAC) and Generative Adversarial Imputation Network (GAIN) were researched as representatives of generative models, while the denoising autoencoder (DAE) represented non-generative models. Performance of the models is compared to traditional methods k-nearest neighbors (k-NN) and Multiple Imputation by Chained Equations (MICE). Moreover, we introduce WGAIN as the Wasserstein modification of GAIN, which turns out to be the best imputation model when the degree of missingness is less than or equal to 30%. Experiments were performed on real-world and artificial datasets with continuous features where different percentages of features, varying from 10% to 50%, were missing. Evaluation of algorithms was done by measuring the accuracy of the classification model previously trained on the uncorrupted dataset. The results show that GAIN and especially WGAIN are the best imputers regardless of the conditions. In general, they outperform or are comparative to MICE, k-NN, DAE, and VAEAC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题