论文标题
贝叶斯同时分解和预测使用多摩尼克数据
Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data
论文作者
论文摘要
了解阻塞性肺疾病(旧)病理生理学的理解受到可用方法的限制,以检查多摩尼克分子现象与临床结局之间的关系。多摩变数据的综合分解方法可以揭示描述重要生物学信号的潜在变异模式。但是,大多数方法并不提供推断估计分解的框架,同时预测重要的疾病表型或临床结果,也可以容纳多个插补。为了解决这些差距,我们提出了贝叶斯同时分解(BSF)。我们使用共轭正常先验,并表明该模型的后验模式可以通过求解结构化的核定标准式目标来估算,该目标还可以实现等级选择并激发了超参数的选择。然后,我们将BSF扩展到同时预测连续或二元响应,称为贝叶斯同时分解和预测(BSFP)。 BSF和BSFP适用于丢失数据的同时插补和全后推断,包括“阻止”丢失,BSFP提供了未观察到的结果的预测。我们通过仿真显示,BSFP在恢复潜在变化结构以及从估计分解到预测的不确定性的重要性方面具有竞争力。我们还通过模拟在随机丢失和非随机假设下通过模拟研究BSF的归合性能。最后,我们使用BSFP通过一项与HIV相关的旧的研究中的支气管肺泡灌洗代谢组和蛋白质组预测肺功能。我们的分析揭示了由共享代谢组和蛋白质组学表达模式驱动的旧患者的独特簇,以及与肺功能下降相关的多摩变模式。软件可在https://github.com/sarahsamorodnitsky/bsfp上免费获得。
Understanding of the pathophysiology of obstructive lung disease (OLD) is limited by available methods to examine the relationship between multi-omic molecular phenomena and clinical outcomes. Integrative factorization methods for multi-omic data can reveal latent patterns of variation describing important biological signal. However, most methods do not provide a framework for inference on the estimated factorization, simultaneously predict important disease phenotypes or clinical outcomes, nor accommodate multiple imputation. To address these gaps, we propose Bayesian Simultaneous Factorization (BSF). We use conjugate normal priors and show that the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. We then extend BSF to simultaneously predict a continuous or binary response, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation and full posterior inference for missing data, including "blockwise" missingness, and BSFP offers prediction of unobserved outcomes. We show via simulation that BSFP is competitive in recovering latent variation structure, as well as the importance of propagating uncertainty from the estimated factorization to prediction. We also study the imputation performance of BSF via simulation under missing-at-random and missing-not-at-random assumptions. Lastly, we use BSFP to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated OLD. Our analysis reveals a distinct cluster of patients with OLD driven by shared metabolomic and proteomic expression patterns, as well as multi-omic patterns related to lung function decline. Software is freely available at https://github.com/sarahsamorodnitsky/BSFP .