论文标题
LCP感知的并行字符串排序
LCP-Aware Parallel String Sorting
论文作者
论文摘要
当词典分类字符串时,并非总是有必要检查所有符号。例如,在“尤里卡”,“欧亚大陆”和“优秀”中,“欧洲”的词典排名仅取决于其所谓的相关前缀“欧元”。一组字符串的区别前缀尺寸$ d $是实际需要检查的符号数量,以建立所有字符串的词典排序。有效的丝状分落线器应为$ d $ - 意识,即它们的复杂性应取决于$ d $,而不是所有字符串中所有符号的总数$ n $。虽然在顺序设置中有许多$ D $ - 意识分类器,但婴儿车模型似乎没有这样的结果。我们提出了一个框架,该框架对任何现有的PRAM丝状分音器进行了$ d $ WAWARE的修改。派生算法相对于其原始对应而言是最佳的:如果原始算法需要$ o(w(n))$工作,则派生的一个需要$ o(w(d))$ work。执行时间仅增加了一个小因素,该因素在最长相关前缀的长度上是对数。我们的框架普遍用于确定性和随机算法在婴儿车模型的所有变体中,因此($ d $ -unaware)并行字符串排序的未来改进将直接导致$ d $ duare-ware-ware-ware-ware Parate-Parallel Strate-String STRATing的改进。
When lexicographically sorting strings, it is not always necessary to inspect all symbols. For example, the lexicographical rank of "europar" amongst the strings "eureka", "eurasia", and "excells" only depends on its so called relevant prefix "euro". The distinguishing prefix size $D$ of a set of strings is the number of symbols that actually need to be inspected to establish the lexicographical ordering of all strings. Efficient string sorters should be $D$-aware, i.e. their complexity should depend on $D$ rather than on the total number $N$ of all symbols in all strings. While there are many $D$-aware sorters in the sequential setting, there appear to be no such results in the PRAM model. We propose a framework yielding a $D$-aware modification of any existing PRAM string sorter. The derived algorithms are work-optimal with respect to their original counterpart: If the original algorithm requires $O(w(N))$ work, the derived one requires $O(w(D))$ work. The execution time increases only by a small factor that is logarithmic in the length of the longest relevant prefix. Our framework universally works for deterministic and randomized algorithms in all variations of the PRAM model, such that future improvements in ($D$-unaware) parallel string sorting will directly result in improvements in $D$-aware parallel string sorting.