Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework

被引:0
|
作者
Sona, C. P. [1 ]
Mulerikkal, Jaison Paul [1 ]
机构
[1] Rajagiri Sch Engn & Technol, Sunya Labs, Kochi, Kerala, India
关键词
Pattern matching; Hadoop; MapReduce; Big Data; Knuth Morris Pratt Algorithm; Boyer Moore Algorithm; Franek Jennings Smyth Algorithm;
D O I
10.1007/978-3-319-90775-8_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Creating meaning out of the growing Big Data is an insurmountable challenge data scientists face and pattern matching algorithms are great means to create such meaning from heaps of data. However, the available pattern matching algorithms are mostly tested with linear programming models whose adaptability and efficiency are not tested in distributed programming models such as Hadoop Map-Reduce, which supports Big Data. This paper explains an experience of parallelizing three of such pattern matching algorithms, namely - Knuth Morris Pratt Algorithm (KMP), Boyer Moore Algorithm (BM) and a lesser known Franek Jennings Smyth (FJS) Algorithm and porting them to Hadoop MapReduce framework. All the three algorithms are converted to MapReduce programs using key value pairs and experimented on single node as well as cluster Hadoop environment. The result analysis with the Project Gutenberg data-set has shown all the three parallel algorithms scale well on Hadoop environment as the data size increases. The experimental results prove that KMP algorithm gives higher performance for shorter patterns over BM, and BM algorithm gives higher performance than KMP for longer patterns. However, FJS algorithm, which is a hybrid of KMP and Boyer horspool algorithm which is advanced version of BM, outperforms both KMP and BM for shorter and longer patterns, and emerges as the most suitable algorithm for pattern matching in a Hadoop environment.
引用
收藏
页码:45 / 55
页数:11
相关论文
共 50 条
  • [41] Performance analysis of MapReduce Programs on Hadoop cluster
    Maurya, Mahesh
    Mahajan, Sunita
    PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 505 - 510
  • [42] Performance Analysis of Coupling Scheduler for MapReduce/Hadoop
    Tan, Jian
    Meng, Xiaoqiao
    Zhang, Li
    2012 PROCEEDINGS IEEE INFOCOM, 2012, : 2586 - 2590
  • [43] Design and Implement a MapReduce Framework for Executing Standalone Software Packages in Hadoop-based Distributed Environmentsn
    Chen, Chao-Chun
    Hung, Min-Hsiung
    Giang, Nguyen Huu Tinh
    Lin, Hsuan-Chun
    Lin, Tzu-Chao
    SMART SCIENCE, 2013, 1 (02) : 99 - 107
  • [44] Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework
    Wu, Jimmy Ming-Tai
    Srivastava, Gautam
    Wei, Min
    Yun, Unil
    Lin, Jerry Chun-Wei
    INFORMATION SCIENCES, 2021, 553 : 31 - 48
  • [45] Comparison and evaluation of pattern matching algorithms for intrusion detection
    Sapats, Martins
    Paulins, Nauris
    AICT 2013: APPLIED INFORMATION AND COMMUNICATION TECHNOLOGIES, 2013, : 36 - 42
  • [46] A Simulation Framework to Assess Pattern Matching Algorithms in a Space Mission
    Gherardi, Alessandro
    Bevilacqua, Alessandro
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2011, PT II, 2011, 6979 (II): : 404 - 413
  • [47] Performance Evaluation of Distributed Maximum Weighted Matching Algorithms
    Ileri, Can Umut
    Dagdeviren, Orhan
    2016 SIXTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION AND COMMUNICATION TECHNOLOGY AND ITS APPLICATIONS (DICTAP), 2016, : 103 - 108
  • [48] Parallelized Genetic Operations for SBST using Hadoop MapReduce Framework
    Mayandi, Geethapriya
    Arumugam, Chamundeswari
    2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2014, : 1686 - 1691
  • [49] Framework for Analyzing Web Access Logs using Hadoop and MapReduce
    Borgaonkar, Pranjali
    Kumar, Gaurav
    Yaduwanshi, Jyoti
    2018 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN ELECTRICAL, ELECTRONICS & COMMUNICATION ENGINEERING (ICRIEECE 2018), 2018, : 2124 - 2129
  • [50] Algorithms for Iterative Applications in MapReduce Framework
    Reddy, A. Diwakar
    Reddy, J. Geetha
    INTERNATIONAL PROCEEDINGS ON ADVANCES IN SOFT COMPUTING, INTELLIGENT SYSTEMS AND APPLICATIONS, ASISA 2016, 2018, 628 : 51 - 61