论文标题

强大的非参数综合分析,使用稀疏增强

Robust nonparametric integrative analysis to decipher heterogeneity and commonality across subgroups using sparse boosting

论文作者

Li, Zihan, Luo, Ziye, Sun, Yifan

论文摘要

在许多生物医学问题中,数据通常是异质的,样品涵盖了多个患者亚组,其中不同的亚组可能具有不同的疾病亚型,阶段或其他医疗环境。这些亚组可能是相关的,但预计它们在潜在的生物学方面有差异。异质数据为探索相关亚组之间的异质性和共同点提供了宝贵的机会。不幸的是,仍然缺乏有效的统计分析方法。最近,已经提出了几种基于综合分析的新方法来解决这个具有挑战性的问题。尽管有希望的结果,但现有的研究仍然受到忽略数据污染的限制,并严格假设协变量对响应的线性影响。因此,我们开发了一种强大的非参数综合分析方法来识别异质性和共同点,并选择了重要的协变量并估计协变量效应。通过采用Cauchy损失函数来容纳可能的数据污染,并建立非参数模型以适应非线性效应。提出的方法基于稀疏的增强技术。在广泛的模拟中证明了拟议方法的优势。对多形和肺腺癌胶质母细胞瘤的癌症基因组图图数据的分析表明,所提出的方法使生物学上有意义的发现具有令人满意的预测。

In many biomedical problems, data are often heterogeneous, with samples spanning multiple patient subgroups, where different subgroups may have different disease subtypes, stages, or other medical contexts. These subgroups may be related, but they are also expected to have differences with respect to the underlying biology. The heterogeneous data presents a precious opportunity to explore the heterogeneities and commonalities between related subgroups. Unfortunately, effective statistical analysis methods are still lacking. Recently, several novel methods based on integrative analysis have been proposed to tackle this challenging problem. Despite promising results, the existing studies are still limited by ignoring data contamination and making strict assumptions of linear effects of covariates on response. As such, we develop a robust nonparametric integrative analysis approach to identify heterogeneity and commonality, as well as select important covariates and estimate covariate effects. Possible data contamination is accommodated by adopting the Cauchy loss function, and a nonparametric model is built to accommodate nonlinear effects. The proposed approach is based on a sparse boosting technique. The advantages of the proposed approach are demonstrated in extensive simulations. The analysis of The Cancer Genome Atlas data on glioblastoma multiforme and lung adenocarcinoma shows that the proposed approach makes biologically meaningful findings with satisfactory prediction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源