A bit-parallel approach to suffix automata: Fast extended string matching

被引:0
|
作者
Navarro, G
Raffinot, M
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
[2] Inst Gaspard Monge, F-77454 Marne La Vallee 2, France
来源
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a " suffix automaton " which is made deterministic in the preprocessing. BNDM, instead, simulates the nondeterministic version using bit-parallelism. This algorithm is 20%-25% faster than BDM, 2-3 times faster than other bit-parallel algorithms, and 10%-40% faster than all the Boyer-Moore family. This makes it the fastest algorithm in all cases except for very short or very long patterns (e.g. on English text it is the fastest between 5 and 110 characters). Moreover, the algorithm is very simple, allowing to easily implement other variants of BDM which are extremely complex in their original formulation. We show that, as other bit-parallel algorithms, BNDM can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in the pattern or in the text, combining simplicity, efficiency and flexibility. We also generalize the suffix automaton definition to handle classes of characters. To the best of our knowledge, this extension has not been studied before.
引用
收藏
页码:14 / 33
页数:20
相关论文
共 50 条
  • [21] Study of Bit-Parallel Approximate Parameterized String Matching Algorithms
    Prasad, Rajesh
    Agarwal, Suneeta
    CONTEMPORARY COMPUTING, PROCEEDINGS, 2009, 40 : 26 - 36
  • [22] Fast Bit-Parallel Matching for Network and Regular Expressions
    Kaneta, Yusaku
    Minato, Shin-ichi
    Arimura, Hiroki
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2010, 6393 : 372 - 384
  • [23] On the bit-parallel simulation of the nondeterministic Aho-Corasick and suffix automata for a set of patterns
    Cantone, Domenico
    Faro, Simone
    Giaquinta, Emanuele
    JOURNAL OF DISCRETE ALGORITHMS, 2012, 11 (01) : 25 - 36
  • [24] Approximate string matching with suffix automata
    Ukkonen, Wesko
    Wood, Derick
    Algorithmica (New York), 1993, 10 (05): : 353 - 364
  • [25] A space efficient bit-parallel algorithm for the multiple string matching problem
    Cantone, Domenico
    Faro, Simone
    INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2006, 17 (06) : 1235 - 1251
  • [26] Hierarchical Parallelism of Bit-Parallel Algorithm for Approximate String Matching on GPUs
    Lin, Cheng-Hung
    Wang, Guan-Hong
    Huang, Chun-Cheng
    2014 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS AND COMMUNICATIONS (SCAC), 2014, : 76 - 81
  • [27] Bit-parallel computation for string alignment
    Yu, Yunqing
    Baba, Kensuke
    Hanmei, E.
    Murakami, Kazuaki
    RECENT PROGRESS IN COMPUTATIONAL SCIENCES AND ENGINEERING, VOLS 7A AND 7B, 2006, 7A-B : 589 - 593
  • [28] A new parameterized string matching algorithm by combining bit-parallelism and suffix automata
    Prasad, Rajesh
    Agarwal, Suneeta
    2008 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2008, : 778 - +
  • [29] APPROXIMATE STRING-MATCHING WITH SUFFIX AUTOMATA
    UKKONEN, E
    WOOD, D
    ALGORITHMICA, 1993, 10 (05) : 353 - 364
  • [30] A weak approach to suffix automata simulation for exact and approximate string matching
    Faro, Simone
    Scafiti, Stefano
    THEORETICAL COMPUTER SCIENCE, 2022, 933 : 88 - 103