SANA：基因本体论的跨物种预测通过拓扑网络对齐

论文标题

SANA：基因本体论的跨物种预测通过拓扑网络对齐

SANA: Cross-Species Prediction of Gene Ontology GO Annotations via Topological Network Alignment

论文作者

Wang, Siyue, Atkinson, Giles R. S., Hayes, Wayne B.

论文摘要

拓扑网络对齐旨在使两个网络在节点方面对齐，以最大程度地提高观察到的共同连接（边缘）拓扑。因此，两种蛋白质蛋白相互作用（PPI）网络的拓扑比对应揭露与相似相互作用伙伴的蛋白质对，并允许例如预测常见基因本体论（GO）项。不幸的是，仅基于拓扑的网络对齐算法就已经能够实现此目标，尽管包括序列相似性的网络对齐算法已经取得了一些成功。我们认为，仅拓扑的失败是由于几乎所有物种的PPI网络数据的稀疏性和不完整性，这为网络拓扑提供了一个较小的信噪比，当序列信息添加到混合物中时，该比例有效地淹没了。在这里，我们表明可以使用“良好”拓扑网络对齐的多个随机样品检测到弱信号，这使我们能够观察两个网络的区域，这些网络在多个样品中牢固地对齐。所得的网络对齐频率（NAF）与基于GO的RESNIK语义相似性密切相关，并实现了基于仅拓扑网络对准的GO术语的首个成功的跨物种预测。我们的最佳预测的AUPR约为0.4，即使没有可观察到的序列相似性且没有已知同源关系，它与最先进的算法具有竞争力。尽管我们的结果仅在现有网络数据上提供了“概念验证”，但我们假设随着PPI网络数据的数量和质量的增加，预测仅拓扑网络一致性的GO术语将变得越来越实用。

Topological network alignment aims to align two networks node-wise in order to maximize the observed common connection (edge) topology between them. The topological alignment of two Protein-Protein Interaction (PPI) networks should thus expose protein pairs with similar interaction partners allowing, for example, the prediction of common Gene Ontology (GO) terms. Unfortunately, no network alignment algorithm based on topology alone has been able to achieve this aim, though those that include sequence similarity have seen some success. We argue that this failure of topology alone is due to the sparsity and incompleteness of the PPI network data of almost all species, which provides the network topology with a small signal-to-noise ratio that is effectively swamped when sequence information is added to the mix. Here we show that the weak signal can be detected using multiple stochastic samples of "good" topological network alignments, which allows us to observe regions of the two networks that are robustly aligned across multiple samples. The resulting Network Alignment Frequency (NAF) strongly correlates with GO-based Resnik semantic similarity and enables the first successful cross-species predictions of GO terms based on topology-only network alignments. Our best predictions have an AUPR of about 0.4, which is competitive with state-of-the-art algorithms, even when there is no observable sequence similarity and no known homology relationship. While our results provide only a "proof of concept" on existing network data, we hypothesize that predicting GO terms from topology-only network alignments will become increasingly practical as the volume and quality of PPI network data increase.

下载PDF全文

下载文献需遵守相关版权规定

论文标题