基于距离的系统发育推断，可以从键入数据：统一视图

论文标题

基于距离的系统发育推断，可以从键入数据：统一视图

Distance-based phylogenetic inference from typing data: a unifying view

论文作者

Vaz, Cátia, Nascimento, Marta, Carriço, João A., Rocher, Tatiana, Francisco, Alexandre P.

论文摘要

打字方法广泛用于监测传染病，爆发调查和对感染自然病史的研究。它们的使用已成为标准配置，尤其是引入高通量测序（HTS）。另一方面，正在生成的数据是巨大的，并且已经提出了许多算法来对数据进行系统发育分析，从而解决了正确性和可伸缩性问题。用于推断系统发育树的大多数基于距离的算法遵循最接近的加入方案。这是分层聚类中使用的方法之一。尽管系统发育推断算法似乎相当不同，但它们之间的主要区别在于人们如何定义群集接近度以及使用哪种优化标准。集群接近性和优化标准都经常依赖于进化模型。在这项工作中，我们审查了这些算法的统一观点。这不仅是更好地了解此类算法的重要步骤，而且要确定可能的计算瓶颈和改进，对于处理大型数据集很重要。

Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been proposed for phylogenetic analysis of typing data, addressing both correctness and scalability issues. Most of the distance-based algorithms for inferring phylogenetic trees follow the closest-pair joining scheme. This is one of the approaches used in hierarchical clustering. And although phylogenetic inference algorithms may seem rather different, the main difference among them resides on how one defines cluster proximity and on which optimization criterion is used. Both cluster proximity and optimization criteria rely often on a model of evolution. In this work we review, and we provide an unified view of these algorithms. This is an important step not only to better understand such algorithms, but also to identify possible computational bottlenecks and improvements, important to deal with large data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题