A partition-based efficient algorithm for large scale mul tiple-strings matching

被引:0
|
作者
Liu, Ping [1 ]
Liu, Yan-Bing [1 ]
Tan, Jian-Long [1 ]
机构
[1] Chinese Acad Sci, Software Div, Inst Comp Technol, Beijing 100080, Peoples R China
来源
String Processing and Information Retrieval, Proceedings | 2005年 / 3772卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Filtering plays an important role in the Internet security and information retrieval fields, and usually employs multiple-strings matching algorithm as its key part. All the classical matching algorithms, however, perform badly when the number of the keywords exceeds a critical point, which made large scale multiple-strings matching problem a great challenge. Based on the observation that the speed of the classical algorithms depends mainly on the length of the shortest keyword, a partition strategy was proposed to decompose the keywords set into a series of subsets on which the classical algorithms was performed. For the optimal partition, it was proved that the keywords with same length locate in one subset, and length of keywords in different subsets would not interlace each other. In this paper, we proposed a shortest-path model for the optimal partition finding problem. Experiments on both random and real data demonstrate that our algorithms generally has about a 100-300% speed-up compared with the classical ones.
引用
收藏
页码:399 / 404
页数:6
相关论文
共 50 条
  • [1] Partition-based block matching of large class hierarchies
    Hu, Wei
    Zhao, Yuanyuan
    Qu, Yuzhong
    SEMANTIC WEB - ASWC 2006, PROCEEDINGS, 2006, 4185 : 72 - 83
  • [2] An efficient partition-based parallel PageRank algorithm
    Manaskasemsak, B
    Rungsawang, A
    11th International Conference on Parallel and Distributed Systems, Vol I, Proceedings, 2005, : 257 - 263
  • [3] A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT
    Ahn, Junghyun
    Sung, Changho
    Kim, Tag Gon
    PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 2723 - 2734
  • [4] A Partition-Based Broadcast Algorithm over DHT for Large-Scale Computing Infrastructures
    Huang, Kun
    Zhang, Dafang
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2009, 5529 : 422 - +
  • [5] PARTITION-BASED PATTERN MATCHING APPROACH FOR EFFICIENT RETRIEVAL OF ARABIC TEXT
    Hakak, Saqib
    Kamsin, Amirrudin
    Shivakumara, Palaiahnakote
    Idris, Mohd Yamani Idna
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2018, 31 (03) : 200 - 209
  • [6] Subscription partition-based multi-index parallel matching algorithm
    Zou, Zhiwen
    Li, Qiao
    Wang, Zhenghui
    Fei, Hongzhe
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2013, 41 (SUPPL.2): : 184 - 187
  • [7] FrepJoin: an efficient partition-based algorithm for edit similarity join
    Ji-zhou Luo
    Sheng-fei Shi
    Hong-zhi Wang
    Jian-zhong Li
    Frontiers of Information Technology & Electronic Engineering, 2017, 18 : 1499 - 1510
  • [8] FrepJoin:an efficient partition-based algorithm for edit similarity join
    Ji-zhou LUO
    Sheng-fei SHI
    Hong-zhi WANG
    Jian-zhong LI
    FrontiersofInformationTechnology&ElectronicEngineering, 2017, 18 (10) : 1499 - 1510
  • [9] FrepJoin: an efficient partition-based algorithm for edit similarity join
    Luo, Ji-zhou
    Shi, Sheng-fei
    Wang, Hong-zhi
    Li, Jian-zhong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2017, 18 (10) : 1499 - 1510
  • [10] Partition-based parallel PageRank algorithm
    Rungsawang, A
    Manaskasemsak, B
    Third International Conference on Information Technology and Applications, Vol 2, Proceedings, 2005, : 57 - 62