论文标题

多个基因组分析框架:所有SARS-COV-2完整变体的情况

Multiple Genome Analytics Framework: The Case of All SARS-CoV-2 Complete Variants

论文作者

Xylogiannopoulos, Konstantinos

论文摘要

模式检测和弦匹配是计算机科学中的基本问题,并且生物信息学和计算生物学的加速扩展使它们成为这两个学科的核心主题。 SARS-COV-2大流行使每周发现数百或数千种新的基因组变体的要求更高,这是由于持续的突变,并且迫切需要快速准确的分析。基因组分析的计算工具(例如序列比对)的要求非常重要,尽管在大多数情况下,所需的资源和计算能力是巨大的。提出的多个基因组分析框架结合了数据结构和算法,专门用于文本挖掘和模式检测,可以有助于有效地解决几种计算生物学和生物信息学问题,同时与最少的资源同时解决。具有空间和时间复杂性o(nlogn)的高级算法的单个执行足以获取有关在多个基因组序列中存在的所有重复模式的知识,并且可以从其他元叠层中使用此信息,以实现进一步的荟萃分析。通过分析超过300,000个SARS-COV-2基因组序列,并检测所有重复模式,这些序列在这些序列中的长度最高60个核苷酸,证明了所提出的框架的潜力。这些结果已被用来为诸如所有变体,序列比对,后验和串联重复检测,不同的生物基因组比较,聚合酶链反应反应引物检测等方面的共同模式提供答案。

Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The SARS-CoV-2 pandemic has made such problems more demanding with hundreds or thousands of new genome variants discovered every week, because of constant mutations, and there is a desperate need for fast and accurate analyses. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and pattern detection, that can help to efficiently address several computational biology and bioinformatics problems concurrently with minimal resources. A single execution of advanced algorithms, with space and time complexity O(nlogn), is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used from other meta-algorithms for further meta-analyses. The potential of the proposed framework is demonstrated with the analysis of more than 300,000 SARS-CoV-2 genome sequences and the detection of all repeated patterns with length up to 60 nucleotides in these sequences. These results have been used to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源