Covit：使用视觉变压器的SARS-COV-2大流行的实时系统发育学

论文标题

Covit：使用视觉变压器的SARS-COV-2大流行的实时系统发育学

CoViT: Real-time phylogenetics for the SARS-CoV-2 pandemic using Vision Transformers

论文作者

Jahshan, Zuher, Alkan, Can, Yavits, Leonid

论文摘要

实时病毒基因组检测，分类学分类和系统发育分析对于有效跟踪和控制病毒大流传学（例如COVID-19）至关重要。但是，前所未有的且仍在增长的病毒基因组数据产生了计算瓶颈，从而有效防止了实时大流行跟踪。为了使基因组追踪有效起作用，每个新的病毒基因组序列都必须置于其串联环境中。将包含数百万个样本的数据集重新提出SARS-COV-2的全部系统发育，即使使用强大的计算资源，也很慢。我们试图通过修改和应用视觉变压器（用于图像识别的最近开发的神经网络模型）来减轻计算瓶颈，以分类和放置病毒基因组的分类和放置，例如SARS-COV-2。我们的解决方案Covit将SARS-COV-2基因组辅助量放置在SARS-COV-2系统发育树上，精度为94.2％。由于Covit是一个分类神经网络，因此它提供了多个可能的位置。具体而言，Covit建议的两个最有可能的位置之一是正确的，概率为97.9％。在Covit产生的五个最有可能的位置中找到正确位置的可能性为99.8％。在NVIDIAS GEFORCE RTX 2080 TI GPU上运行的每个基因组的放置时间为0.055s。我们通过github：https：//github.com/zuherjahshan/covit向研究社区提供Covit。

Real-time viral genome detection, taxonomic classification and phylogenetic analysis are critical for efficient tracking and control of viral pandemics such as Covid-19. However, the unprecedented and still growing amounts of viral genome data create a computational bottleneck, which effectively prevents the real-time pandemic tracking. For genomic tracing to work effectively, each new viral genome sequence must be placed in its pangenomic context. Re-inferring the full phylogeny of SARS-CoV-2, with datasets containing millions of samples, is prohibitively slow even using powerful computational resources. We are attempting to alleviate the computational bottleneck by modifying and applying Vision Transformer, a recently developed neural network model for image recognition, to taxonomic classification and placement of viral genomes, such as SARS-CoV-2. Our solution, CoViT, places SARS-CoV-2 genome accessions onto SARS-CoV-2 phylogenetic tree with the accuracy of 94.2%. Since CoViT is a classification neural network, it provides more than one likely placement. Specifically, one of the two most likely placements suggested by CoViT is correct with the probability of 97.9%. The probability of the correct placement to be found among the five most likely placements generated by CoViT is 99.8%. The placement time is 0.055s per individual genome running on NVIDIAs GeForce RTX 2080 Ti GPU. We make CoViT available to research community through GitHub: https://github.com/zuherJahshan/covit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题