论文标题
STEMP:RNA和蛋白质折叠预测的快速和确定性的茎形方法
StemP: A fast and deterministic Stem-graph approach for RNA and protein folding prediction
论文作者
论文摘要
我们提出了一种新的确定性方法,以预测RNA序列和蛋白质折叠。 STEM足以进行结构预测吗?主要思想是考虑给定序列中的所有可能的茎形成。使用茎环的能量和茎的强度,我们探索了如何确定性地利用茎信息来进行RNA序列和蛋白质折叠结构预测。我们使用图表,其中所有可能的词干被表示为顶点,并共存为边缘。此完整的茎绘制呈现所有可能的折叠结构,我们选择子图(s),这为折叠结构预测提供了最佳的匹配能量。我们引入了一个STEM环分数,以添加结构信息并加快计算。所提出的方法可以处理二级结构预测以及用伪结折叠的蛋白质。数值实验是使用笔记本电脑进行的,结果仅需几分钟或秒。这种方法的优势之一是算法的简单性和灵活性,并给出了确定性的答案。我们探索来自Gutell Lab的蛋白质数据库,RRNA 5S序列和tRNA序列的蛋白质序列。包括各种实验和比较以验证提出方法。
We propose a new deterministic methodology to predict RNA sequence and protein folding. Is stem enough for structure prediction? The main idea is to consider all possible stem formation in the given sequence. With the stem loop energy and the strength of stem, we explore how to deterministically utilize stem information for RNA sequence and protein folding structure prediction. We use graph notation, where all possible stems are represented as vertices, and co-existence as edges. This full Stem-graph presents all possible folding structure, and we pick sub-graph(s) which give the best matching energy for folding structure prediction. We introduce a Stem-Loop score to add structure information and to speed up the computation. The proposed method can handle secondary structure prediction as well as protein folding with pseudo knots. Numerical experiments are done using a laptop and results take only a few minutes or seconds. One of the strengths of this approach is in the simplicity and flexibility of the algorithm, and it gives deterministic answer. We explore protein sequences from Protein Data Bank, rRNA 5S sequences, and tRNA sequences from the Gutell Lab. Various experiments and comparisons are included to validate the propose method.