Efficient Regular Expression Matching on Compressed Strings

被引:0
|
作者
Han, Yutong [1 ]
Wang, Bin [1 ]
Yang, Xiaochun [1 ]
Zhu, Huaijie [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II | 2017年 / 10178卷
关键词
Regular expression; LZ77; String matching; Self-index; SEARCH;
D O I
10.1007/978-3-319-55699-4_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing methods for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as high compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., a prefix and a suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant suffix trie index, called SSLZ. In addition, we construct bitmaps for factors of regular expression to detect potential region and propose block filtering to reduce candidates. At last, we conduct a comprehensive performance evaluation using five real datasets to validate our ideas and the proposed algorithms. The experimental result shows that our RELZ algorithm outperforms the existing algorithms significantly.
引用
收藏
页码:219 / 234
页数:16
相关论文
共 50 条
  • [21] Scalable TCAM-based Regular Expression Matching with Compressed Finite Automata
    Huang, Kun
    Ding, Linxuan
    Xie, Gaogang
    Zhang, Dafang
    Liu, Alex X.
    Salamatian, Kave
    2013 ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS), 2013, : 83 - 93
  • [22] String matching in Lempel-Ziv compressed strings
    Farach, M
    Thorup, M
    ALGORITHMICA, 1998, 20 (04) : 388 - 404
  • [23] Approximate Matching of Run-Length Compressed Strings
    Algorithmica, 2003, 35 : 347 - 369
  • [24] Approximate matching of run-length compressed strings
    Mäkinen, V
    Navarro, G
    Ukkonen, E
    ALGORITHMICA, 2003, 35 (04) : 347 - 369
  • [25] Efficient Regular Expression Matching Based on Positional Inverted Index
    Qiu, Tao
    Yang, Xiaochun
    Wang, Bin
    Wang, Wei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (03) : 1133 - 1148
  • [26] An Efficient Pre-filter to Accelerate Regular Expression Matching
    Xu, Chengcheng
    Chen, Shuhui
    Wang, Xiaofeng
    Su, Jinshu
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2015, 2015, 9532 : 111 - 119
  • [27] The Regular Expression Matching Algorithm for the Energy Efficient Reconfigurable SoC
    Russek, Pawel
    Wiatr, Kazimierz
    PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT I, 2014, 8384 : 545 - 556
  • [28] Efficient Regular Expression Pattern Matching on Graphics Processing Units
    Ponnemkunnath, Sudheer
    Joshi, R. C.
    CONTEMPORARY COMPUTING, 2011, 168 : 92 - 101
  • [29] Design and optimizations for efficient regular expression matching in DPI systems
    Antonello, Rafael
    Fernandes, Stenio
    Sadok, Djamel
    Kelner, Judith
    Szabo, Geza
    COMPUTER COMMUNICATIONS, 2015, 61 : 103 - 120
  • [30] Pattern Matching on Grammar-Compressed Strings in Linear Time
    Ganardi, Moses
    Gawrychowskit, Pawel
    PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 2833 - 2846