论文标题

FSNET:高维生物学数据的特征选择网络

FsNet: Feature Selection Network on High-dimensional Biological Data

论文作者

Singh, Dinesh, Climente-González, Héctor, Petrovich, Mathis, Kawakami, Eiryo, Yamada, Makoto

论文摘要

包括基因表达数据在内的生物学数据通常是高维数据,需要有效,可概括和可扩展的机器学习方法来发现其复杂的非线性模式。机器学习的最新进展可以归因于深度神经网络(DNN),这在计算机视觉和自然语言处理方面在各种任务中都表现出色。但是,标准DNN不适用于生物学中生成的高维数据集,因为它们具有许多参数,而这些参数又需要许多样本。在本文中,我们提出了一种基于DNN的非线性特征选择方法,称为特征选择网络(FSNET),以用于高维和少量样本数据。具体而言,FSNET包括一个选择层,该选择层选择特征和稳定训练的重建层。由于选择和重建层中的大量参数很容易导致在有限数量的样本下过度拟合,因此我们使用两个微小的网络来预测选择和重建层的大型虚拟权重矩阵。对几个现实世界的高维生物数据集的实验结果证明了该方法的功效。

Biological data including gene expression data are generally high-dimensional and require efficient, generalizable, and scalable machine-learning methods to discover their complex nonlinear patterns. The recent advances in machine learning can be attributed to deep neural networks (DNNs), which excel in various tasks in terms of computer vision and natural language processing. However, standard DNNs are not appropriate for high-dimensional datasets generated in biology because they have many parameters, which in turn require many samples. In this paper, we propose a DNN-based, nonlinear feature selection method, called the feature selection network (FsNet), for high-dimensional and small number of sample data. Specifically, FsNet comprises a selection layer that selects features and a reconstruction layer that stabilizes the training. Because a large number of parameters in the selection and reconstruction layers can easily result in overfitting under a limited number of samples, we use two tiny networks to predict the large, virtual weight matrices of the selection and reconstruction layers. Experimental results on several real-world, high-dimensional biological datasets demonstrate the efficacy of the proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源