论文标题

带有嵌入式特征选择的离群检测集合

Outlier Detection Ensemble with Embedded Feature Selection

论文作者

Cheng, Li, Wang, Yijie, Liu, Xinwang, Li, Bin

论文摘要

特征选择在改善异常检测的性能方面发挥了重要作用,尤其是对于嘈杂的数据。现有方法通常会单独执行特征选择和异常得分,这将选择可能无法最佳用于异常检测的特征子集,从而导致性能不令人满意。在本文中,我们提出了一个带有嵌入式特征选择(ODEF)的离群检测集合框架,以解决此问题。具体而言,对于每个基于子次采样的学习组件,ODEFS将特征选择和离群检测统一为成对排名公式,以学习针对异常检测方法量身定制的特征子集。此外,我们采用阈值的自定进度学习来同时优化特征选择和示例选择,这有助于提高培训集的可靠性。之后,我们设计了一种具有证明收敛性的替代算法,以解决最终的优化问题。此外,我们分析了所提出的框架的概括误差结合,该框架提供了理论保证,并有见地的实用指导。来自不同领域的12个现实世界数据集的全面实验结果验证了所提出的ODEF的优势。

Feature selection places an important role in improving the performance of outlier detection, especially for noisy data. Existing methods usually perform feature selection and outlier scoring separately, which would select feature subsets that may not optimally serve for outlier detection, leading to unsatisfying performance. In this paper, we propose an outlier detection ensemble framework with embedded feature selection (ODEFS), to address this issue. Specifically, for each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation to learn feature subsets that are tailored for the outlier detection method. Moreover, we adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection, which is helpful to improve the reliability of the training set. After that, we design an alternate algorithm with proved convergence to solve the resultant optimization problem. In addition, we analyze the generalization error bound of the proposed framework, which provides theoretical guarantee on the method and insightful practical guidance. Comprehensive experimental results on 12 real-world datasets from diverse domains validate the superiority of the proposed ODEFS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源