使用拓扑，序列和功能信息的数据驱动的生物网络对齐

论文标题

使用拓扑，序列和功能信息的数据驱动的生物网络对齐

Data-driven biological network alignment that uses topological, sequence, and functional information

论文作者

Gu, Shawn, Milenkovic, Tijana

论文摘要

许多蛋白质在功能上保持尚未注释。序列比对（SA）通过在物种序列保守的区域之间传递功能知识来发现缺失的注释。由于SA是不完美的，因此网络对齐（NA）通过在保守的生物网络之间传递功能知识，而不仅仅是不同物种区域的序列。现有NA假定网络区域之间的拓扑相似性（类似于同构的匹配），与该区域的功能相关性相对应。但是，我们最近发现，功能无关的蛋白质几乎与功能相关的蛋白质一样相似。因此，我们将NA重新定义为数据驱动的框架Tara，该框架从网络和蛋白质功能数据中学习了蛋白质之间的拓扑相关性（而不是相似性）与蛋白质的功能相关性相对应。塔拉（Tara）使用拓扑信息（在每个网络中），但不使用序列信息（跨网络之间的蛋白质之间）。然而，与现有NA方法的比对相比，其比对产生了更高的蛋白质功能预测准确性，即使是使用拓扑和序列信息的方法。在这里，我们提出了Tara ++，它也是数据驱动的，例如Tara，与其他现有方法不同，但是与Tara不同，它在网络内部拓扑信息之上使用跨网络序列信息。为了处理内部和交流网络分析，我们将社交网络嵌入到生物NA的问题上。 Tara ++的表现优于现有方法的蛋白质功能预测精度。

Many proteins remain functionally unannotated. Sequence alignment (SA) uncovers missing annotations by transferring functional knowledge between species' sequence-conserved regions. Because SA is imperfect, network alignment (NA) complements SA by transferring functional knowledge between conserved biological network, rather than just sequence, regions of different species. Existing NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions' functional relatedness. However, we recently found that functionally unrelated proteins are almost as topologically similar as functionally related proteins. So, we redefined NA as a data-driven framework, TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to the proteins' functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, its alignments yielded higher protein functional prediction accuracy than alignments of existing NA methods, even those that used both topological and sequence information. Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题