论文标题

BWT运行压缩索引上的最佳时间查询

Optimal-Time Queries on BWT-runs Compressed Indexes

论文作者

Nishimoto, Takaaki, Tabei, Yasuo

论文摘要

为快速查询索引高度重复的字符串(即具有许多重复的字符串)已成为弦乐处理中的一个中心研究主题,因为它在生物信息学和自然语言处理中具有多种应用。尽管到目前为止已经提出了大量高度重复字符串的索引,但开发支持各种查询的压缩索引仍然是一个挑战。运行长度的Burrows-wheeler变换(RLBWT)是通过输入字符串和运行长度编码的可逆置换的无损数据压缩的,并且它引起了索引高度重复的字符串的兴趣。 lf和$ ϕ^{ - 1} $是在RLBWT上构建索引的两个关键功能,而最佳先前结果计算LF和$ o(\ log \ log \ log \ log \ log n)$(\ log \ log \ log n)$($ o(r)$ o(r)字符串长度$ n $的空间单词$ r $ r $ r $ r $ r $ r $ r $ r $ r $ r llbbwt in rlbwt in rlbwt in rlbwt。在本文中,我们改进了lf和$ ϕ^{ - 1} $,以便可以在$ o(r)$空间单词的恒定时间内计算它们。随后,我们提出OptBWtr(BWT-RUNS压缩索引上的最佳时间查询),这是支持各种查询的第一个字符串索引,其中包括定位,计数,最佳时间中提取查询以及$ O(r)$ o(r)$ of Space of Space。

Indexing highly repetitive strings (i.e., strings with many repetitions) for fast queries has become a central research topic in string processing, because it has a wide variety of applications in bioinformatics and natural language processing. Although a substantial number of indexes for highly repetitive strings have been proposed thus far, developing compressed indexes that support various queries remains a challenge. The run-length Burrows-Wheeler transform (RLBWT) is a lossless data compression by a reversible permutation of an input string and run-length encoding, and it has received interest for indexing highly repetitive strings. LF and $ϕ^{-1}$ are two key functions for building indexes on RLBWT, and the best previous result computes LF and $ϕ^{-1}$ in $O(\log \log n)$ time with $O(r)$ words of space for the string length $n$ and the number $r$ of runs in RLBWT. In this paper, we improve LF and $ϕ^{-1}$ so that they can be computed in a constant time with $O(r)$ words of space. Subsequently, we present OptBWTR (optimal-time queries on BWT-runs compressed indexes), the first string index that supports various queries including locate, count, extract queries in optimal time and $O(r)$ words of space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源