A bit-parallel approach to suffix automata: Fast extended string matching

被引:0
|
作者
Navarro, G
Raffinot, M
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
[2] Inst Gaspard Monge, F-77454 Marne La Vallee 2, France
来源
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a " suffix automaton " which is made deterministic in the preprocessing. BNDM, instead, simulates the nondeterministic version using bit-parallelism. This algorithm is 20%-25% faster than BDM, 2-3 times faster than other bit-parallel algorithms, and 10%-40% faster than all the Boyer-Moore family. This makes it the fastest algorithm in all cases except for very short or very long patterns (e.g. on English text it is the fastest between 5 and 110 characters). Moreover, the algorithm is very simple, allowing to easily implement other variants of BDM which are extremely complex in their original formulation. We show that, as other bit-parallel algorithms, BNDM can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in the pattern or in the text, combining simplicity, efficiency and flexibility. We also generalize the suffix automaton definition to handle classes of characters. To the best of our knowledge, this extension has not been studied before.
引用
收藏
页码:14 / 33
页数:20
相关论文
共 50 条
  • [41] A Compressed Enhanced Suffix Array Supporting Fast String Matching
    Oblebusch, Enno
    Gog, Simon
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 51 - 62
  • [42] A Fast Bit-Parallel Algorithm for Computing the Subset Partial Order
    P. Pritchard
    Algorithmica, 1999, 24 : 76 - 86
  • [43] A fast bit-parallel algorithm for computing the subset partial order
    Pritchard, P
    ALGORITHMICA, 1999, 24 (01) : 76 - 86
  • [44] A bit-parallel tree matching algorithm for patterns with horizontal VLDC's
    Tsuji, Hisashi
    Ishino, Akira
    Takeda, Masayuki
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2005, 3772 : 388 - 398
  • [45] Bit-Parallel Tree Pattern Matching Algorithms for Unordered Labeled Trees
    Yamamoto, Hiroaki
    Takenouchi, Daichi
    ALGORITHMS AND DATA STRUCTURES, 2009, 5664 : 554 - +
  • [46] A GPU-Based Bit-Parallel Multiple Pattern Matching Algorithm
    Hung, Che-Lun
    Wang, Hsiao-Hsi
    Hsu, Tzu-Hung
    Lin, Chun-Yuan
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 1219 - 1222
  • [47] Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi
    Tuan Tu Tran
    Liu, Yongchao
    Schmidt, Bertil
    PARALLEL COMPUTING, 2016, 54 : 128 - 138
  • [48] FAST PARALLEL AND SERIAL APPROXIMATE STRING MATCHING
    LANDAU, GM
    VISHKIN, U
    JOURNAL OF ALGORITHMS, 1989, 10 (02) : 157 - 169
  • [49] FAST PARALLEL STRING PREFIX-MATCHING
    BRESLAUER, D
    THEORETICAL COMPUTER SCIENCE, 1995, 137 (02) : 269 - 278
  • [50] A Method to Overcome Computer Word Size Limitation in Bit-Parallel Pattern Matching
    Kuelekci, M. Oguzhan
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2008, 5369 : 496 - 506