论文标题

关于索引和压缩有限自动机

On Indexing and Compressing Finite Automata

论文作者

Cotumaccio, Nicola, Prezza, Nicola

论文摘要

有限自动机的索引是一种强大的数据结构,它支持定位标记有查询模式的路径,从而在基础常规语言上求解模式匹配。在本文中,我们解决了索引任意有限自动机的长期问题。我们的解决方案包括找到状态的部分共同摄影顺序,并证明在总顺序的情况下,通过给定的字符串达到的状态在部分顺序上形成一个间隔,从而实现了索引。我们提供了一个下限,说明这样的间隔需要$ o(p)$单词要代表,$ p $是该订单的宽度(即最大的抗抗小节的大小)。确实,我们表明$ p $确定有限自动机上几个基本问​​题的复杂性:(i)让$σ$是字母大小,我们使用$ \ lceil \ log c \ nogσ\ rceil + 2 \ lceil + 2 \ lceil \ log p \ rceil + p \ rceil + 2 $ log 2 $ l log perition和a log log perition和a log log log perition和a log log log log log log log log log log log log log log log log log log log log log log log l log loga σ\ rceil + \ lceil \ log p \ rceil + 2 $每个过渡。这是通过将洞穴转速转换为任意自动机的概括来实现的。 (ii)我们表明,可以在$ \ tilde o(m \ cdot p^2)$上的查询时间以$ \ tilde o(m \ cdot p^2)求解。 (iii)我们为索引DFA提供了多项式时间算法,同时匹配$ p $的最佳值。另一方面,我们证明了问题是NFAS上的NP-HARD。 (iv)我们表明,在最坏的情况下,NFA确定化的经典PowerSet构造算法产生了同等的DFA $ 2^p(n-p+1)-1 $,其中$ n $是NFA州的数量。

An index for a finite automaton is a powerful data structure that supports locating paths labeled with a query pattern, thus solving pattern matching on the underlying regular language. In this paper, we solve the long-standing problem of indexing arbitrary finite automata. Our solution consists in finding a partial co-lexicographic order of the states and proving, as in the total order case, that states reached by a given string form one interval on the partial order, thus enabling indexing. We provide a lower bound stating that such an interval requires $O(p)$ words to be represented, $p$ being the order's width (i.e. the size of its largest antichain). Indeed, we show that $p$ determines the complexity of several fundamental problems on finite automata: (i) Letting $σ$ be the alphabet size, we provide an encoding for NFAs using $\lceil\log σ\rceil + 2\lceil\log p\rceil + 2$ bits per transition and a smaller encoding for DFAs using $\lceil\log σ\rceil + \lceil\log p\rceil + 2$ bits per transition. This is achieved by generalizing the Burrows-Wheeler transform to arbitrary automata. (ii) We show that indexed pattern matching can be solved in $\tilde O(m\cdot p^2)$ query time on NFAs. (iii) We provide a polynomial-time algorithm to index DFAs, while matching the optimal value for $ p $. On the other hand, we prove that the problem is NP-hard on NFAs. (iv) We show that, in the worst case, the classic powerset construction algorithm for NFA determinization generates an equivalent DFA of size $2^p(n-p+1)-1$, where $n$ is the number of NFA's states.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源