论文标题
通过指数随机图模型测试生物网络基序重要性
Testing biological network motif significance with exponential random graph models
论文作者
论文摘要
对生物网络结构的分析通常使用统计检验来建立主题的过度代表,这些基序被认为是与其生物学功能有关的此类网络的重要组成部分。但是,对于这些基序的统计意义存在分歧,标准方法的潜在问题可以估算这种意义。指数随机图模型(ERGM)是一类统计模型,可以克服一些常用方法的一些缺点来测试基序的统计显着性。 ERGMS在十年前首次被引入生物信息学文献中,但在生物网络上的应用有限,这可能是由于估计模型参数的实际困难。估计算法的进步现在可以在实际时间内分析更大的网络。我们说明了ERGM在无向蛋白质 - 蛋白质相互作用(PPI)网络和定向基因调节网络中的应用。 ERGM模型表明PPI网络中三角形的过度代表,并证实了先前研究的结果,以表明大肠杆菌和酵母菌网络中的及其及其及其及其及其及其及其的及其及其传递三角形的占代表性。我们还确认,使用ERGMS,先前的研究表明,可以解释循环三角形(反馈回路)的代表性不足,这是其他拓扑特征的结果。
Analysis of the structure of biological networks often uses statistical tests to establish the over-representation of motifs, which are thought to be important building blocks of such networks, related to their biological functions. However, there is disagreement as to the statistical significance of these motifs, and there are potential problems with standard methods for estimating this significance. Exponential random graph models (ERGMs) are a class of statistical model that can overcome some of the shortcomings of commonly used methods for testing the statistical significance of motifs. ERGMs were first introduced into the bioinformatics literature over ten years ago but have had limited application to biological networks, possibly due to the practical difficulty of estimating model parameters. Advances in estimation algorithms now afford analysis of much larger networks in practical time. We illustrate the application of ERGM to both an undirected protein-protein interaction (PPI) network and directed gene regulatory networks. ERGM models indicate over-representation of triangles in the PPI network, and confirm results from previous research as to over-representation of transitive triangles (feed-forward loop) in an E. coli and a yeast regulatory network. We also confirm, using ERGMs, previous research showing that under-representation of the cyclic triangle (feedback loop) can be explained as a consequence of other topological features.