论文标题
具有beta差异的表格数据的鲁棒性自动编码器
Robust Variational Autoencoder for Tabular Data with Beta Divergence
论文作者
论文摘要
我们提出了一种具有$β$ divergence的稳健变量自动编码器,用于表格数据(RTVAE),具有混合的分类和连续特征。变异自动编码器(VAE)及其变化是用于异常检测问题的流行框架。主要的假设是,我们可以通过VAE学习正常模式的表示形式,而与之相关的任何偏差都可能表明异常。但是,培训数据本身可以包含异常值。培训数据中离群值的来源包括数据收集过程本身(随机噪声)或恶意攻击者(数据中毒),他们可能针对降低机器学习模型的性能。无论哪种情况,这些异常值都可能不成比例地影响VAE的训练过程,并可能对正常行为是错误的结论。在这项工作中,我们为表格数据集提供了一种新颖的形式,具有具有分类和连续特征的表格数据集,对训练数据中的异常值是可靠的。我们在网络流量数据集的异常检测应用程序上的结果证明了我们方法的有效性。
We propose a robust variational autoencoder with $β$ divergence for tabular data (RTVAE) with mixed categorical and continuous features. Variational autoencoders (VAE) and their variations are popular frameworks for anomaly detection problems. The primary assumption is that we can learn representations for normal patterns via VAEs and any deviation from that can indicate anomalies. However, the training data itself can contain outliers. The source of outliers in training data include the data collection process itself (random noise) or a malicious attacker (data poisoning) who may target to degrade the performance of the machine learning model. In either case, these outliers can disproportionately affect the training process of VAEs and may lead to wrong conclusions about what the normal behavior is. In this work, we derive a novel form of a variational autoencoder for tabular data sets with categorical and continuous features that is robust to outliers in training data. Our results on the anomaly detection application for network traffic datasets demonstrate the effectiveness of our approach.