论文标题
通过解开祖先表征来改善各种人群的遗传风险预测
Improving genetic risk prediction across diverse population by disentangling ancestry representations
论文作者
论文摘要
使用遗传数据的风险预测模型已经看到基因组学的牵引力增加。但是,大多数多基因风险模型都是使用具有相似(主要是欧洲)血统的参与者的数据开发的。这可能导致风险预测因素的偏见,导致对少数族裔人群和诸如非洲裔美国人等混合的人的概括不佳。为了解决这一偏见,很大程度上是由于预测模型被潜在的人口结构所困惑,我们提出了一个新颖的深度学习框架,该框架利用来自不同人群的数据,并在其表示中脱离了与表型相关的信息。祖先分解的表示形式可用于建立在少数族裔人群中表现更好的风险预测因素。我们将提出的方法应用于对阿尔茨海默氏病遗传学的分析。与标准线性和非线性风险预测方法相比,所提出的方法显着改善了少数民族人群的风险预测,尤其是对于混合型个人而言。
Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this bias, largely due to the prediction models being confounded by the underlying population structure, we propose a novel deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, particularly for admixed individuals.