论文标题
高通量测序应用中的TypeAfe坐标系统
Typesafe Coordinate Systems in High-Throughput Sequencing Applications
论文作者
论文摘要
高通量测序的文件格式和工具对参考序列的坐标间隔至少以四种不同的,不兼容的方式进行编码。在不同格式之间集成数据和移动数据的数据有可能引入微妙的逐个错误。在这里,我们介绍了typesafe坐标的概念:坐标间隔不仅是整数对,而且是包含四种类型类型类的类型类别的成员:零或一个基础的笛卡尔产品,以及开放或封闭的间隔端。通过利用静态和强型的类型系统,我们可以提供静态的语言,可以保证消除整个错误类别。我们在D中提供了参考实现,作为较大工作的一部分(DHTSLIB),以及Rust,Ocaml和Python中的概念证明。探索性实现可从https://github.com/blachlylab/typesafe-coordinates获得。
High-throughput sequencing file formats and tools encode coordinate intervals with respect to a reference sequence in at least four distinct, incompatible ways. Integrating data from and moving data between different formats has the potential to introduce subtle off-by-one errors. Here, we introduce the notion of typesafe coordinates: coordinate intervals are not only an integer pair, but members of a type class comprising four types: the Cartesian product of a zero or one basis, and an open or closed interval end. By leveraging the type system of statically and strongly-typed, compiled languages we can provide static guarantees that an entire class of error is eliminated. We provide a reference implementation in D as part of a larger work (dhtslib), and proofs of concept in Rust, OCaml, and Python. Exploratory implementations are available at https://github.com/blachlylab/typesafe-coordinates.