论文标题

使用自动编码器和内核方法的肿瘤轮廓的无监督特征选择

Unsupervised Feature Selection for Tumor Profiles using Autoencoders and Kernel Methods

论文作者

Palazzo, Martin, Beauseroy, Pierre, Yankilevich, Patricio

论文摘要

来自肿瘤谱的分子数据是高维的。肿瘤特征可以以数万个基因表达特征为特征。由于基因表达特征集的大小,机器学习方法暴露于嘈杂的变量和复杂性。肿瘤类型存在异质性,可以细分为肿瘤亚型。在许多情况下,肿瘤数据不包括肿瘤亚型标记,因此无监督的学习方法对于肿瘤亚型发现是必需的。这项工作旨在学习肿瘤样品的有意义且低维的表示,并找到肿瘤亚型簇,同时在不使用肿瘤标签的情况下保持生物学特征。所提出的名为潜在核特征选择(LKFS)的方法是一种无监督的方法,用于肿瘤基因表达谱中的基因选择。通过使用自动编码器,可以将低维和de胶的潜在空间作为目标表示,以指导选择基因子集的多个内核学习模型。通过使用所选基因,一种聚类方法用于分组样本。为了评估所提出的无监督特征选择方法的性能,以临床意义对获得的特征和簇进行了分析。所提出的方法已应用于三个肿瘤数据集,分别是大脑,肾脏和肺,每种肿瘤由两个肿瘤亚型组成。与基准无监督的特征选择方法相比,通过提出的方法获得的结果揭示了所选特征的冗余性较低和更好的聚类性能。

Molecular data from tumor profiles is high dimensional. Tumor profiles can be characterized by tens of thousands of gene expression features. Due to the size of the gene expression feature set machine learning methods are exposed to noisy variables and complexity. Tumor types present heterogeneity and can be subdivided in tumor subtypes. In many cases tumor data does not include tumor subtype labeling thus unsupervised learning methods are necessary for tumor subtype discovery. This work aims to learn meaningful and low dimensional representations of tumor samples and find tumor subtype clusters while keeping biological signatures without using tumor labels. The proposed method named Latent Kernel Feature Selection (LKFS) is an unsupervised approach for gene selection in tumor gene expression profiles. By using Autoencoders a low dimensional and denoised latent space is learned as a target representation to guide a Multiple Kernel Learning model that selects a subset of genes. By using the selected genes a clustering method is used to group samples. In order to evaluate the performance of the proposed unsupervised feature selection method the obtained features and clusters are analyzed by clinical significance. The proposed method has been applied on three tumor datasets which are Brain, Renal and Lung, each one composed by two tumor subtypes. When compared with benchmark unsupervised feature selection methods the results obtained by the proposed method reveal lower redundancy in the selected features and a better clustering performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源