论文标题

OMICS协会研究中有力混淆调整的一般框架

A General Framework for Powerful Confounder Adjustment in Omics Association Studies

论文作者

Roy, Asmita, Chen, Jun, Zhang, Xianyang

论文摘要

基因组数据受混杂的各种来源,例如人口统计学变量,生物异质性和批处理效应。 To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction. This study shows that the traditional approach was sub-optimal and proposes a new two-dimensional false discovery rate control framework (2dFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2dFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2dFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. To achieve this goal, our method requires the conditional distribution of the covariate given the confounders to be known or can be estimated from the data.我们开发了一个新的程序,可以同时选择边际独立性测试统计数据的两个截止值。事实证明,2DFDR+可以提供渐近的FDR控制并主导传统程序的力量。通过广泛的模拟和实际数据应用来证明有希望的有限样本性能。

Genomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction. This study shows that the traditional approach was sub-optimal and proposes a new two-dimensional false discovery rate control framework (2dFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2dFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2dFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. To achieve this goal, our method requires the conditional distribution of the covariate given the confounders to be known or can be estimated from the data. We develop a new procedure to simultaneously select the two cutoff values for the marginal and conditional independence test statistics. 2dFDR+ is proved to offer asymptotic FDR control and dominate the power of the traditional procedure. Promising finite sample performance is demonstrated via extensive simulations and real data applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源