论文标题

基于封闭的拉普拉斯人的无用的无监督特征选择

Differentiable Unsupervised Feature Selection based on a Gated Laplacian

论文作者

Lindenbaum, Ofir, Shaham, Uri, Svirsky, Jonathan, Peterfreund, Erez, Kluger, Yuval

论文摘要

科学观察可能包括大量变量(特征)。尽管在无监督的学习中,识别一部分有意义的特征通常会被忽略,尽管它可能会揭示隐藏在环境空间中的清晰模式。在本文中,我们提出了一种无监督特征选择的方法,并演示了其用于聚类的任务。我们提出了一个可区分的损失函数,该函数结合了Laplacian评分,该分数有利于低频特征,并具有用于特征选择的门控机制。我们通过用在特征子集上计算的封闭式变体来替换Laplacian评分。该子集使用Bernoulli变量的连续近似获得,其参数经过训练以获取完整特征空间。我们在数学上激发了所提出的方法,并证明在高噪声状态下,要在门控输入上计算Laplacian至关重要,而不是在完整的功能集上。使用几个现实世界实例提供了提出方法及其比当前基线的优势的实验证明。

Scientific observations may consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, and we demonstrate its use for the task of clustering. We propose a differentiable loss function that combines the Laplacian score, which favors low-frequency features, with a gating mechanism for feature selection. We improve the Laplacian score, by replacing it with a gated variant computed on a subset of features. This subset is obtained using a continuous approximation of Bernoulli variables whose parameters are trained to gate the full feature space. We mathematically motivate the proposed approach and demonstrate that in the high noise regime, it is crucial to compute the Laplacian on the gated inputs, rather than on the full feature set. Experimental demonstration of the efficacy of the proposed approach and its advantage over current baselines is provided using several real-world examples.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源