Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework

被引:0
|
作者
Sona, C. P. [1 ]
Mulerikkal, Jaison Paul [1 ]
机构
[1] Rajagiri Sch Engn & Technol, Sunya Labs, Kochi, Kerala, India
关键词
Pattern matching; Hadoop; MapReduce; Big Data; Knuth Morris Pratt Algorithm; Boyer Moore Algorithm; Franek Jennings Smyth Algorithm;
D O I
10.1007/978-3-319-90775-8_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Creating meaning out of the growing Big Data is an insurmountable challenge data scientists face and pattern matching algorithms are great means to create such meaning from heaps of data. However, the available pattern matching algorithms are mostly tested with linear programming models whose adaptability and efficiency are not tested in distributed programming models such as Hadoop Map-Reduce, which supports Big Data. This paper explains an experience of parallelizing three of such pattern matching algorithms, namely - Knuth Morris Pratt Algorithm (KMP), Boyer Moore Algorithm (BM) and a lesser known Franek Jennings Smyth (FJS) Algorithm and porting them to Hadoop MapReduce framework. All the three algorithms are converted to MapReduce programs using key value pairs and experimented on single node as well as cluster Hadoop environment. The result analysis with the Project Gutenberg data-set has shown all the three parallel algorithms scale well on Hadoop environment as the data size increases. The experimental results prove that KMP algorithm gives higher performance for shorter patterns over BM, and BM algorithm gives higher performance than KMP for longer patterns. However, FJS algorithm, which is a hybrid of KMP and Boyer horspool algorithm which is advanced version of BM, outperforms both KMP and BM for shorter and longer patterns, and emerges as the most suitable algorithm for pattern matching in a Hadoop environment.
引用
收藏
页码:45 / 55
页数:11
相关论文
共 50 条
  • [31] A Survey on Parallel Join Algorithms Using MapReduce on Hadoop
    Barhoush, Malek Mahmoud
    AlSobeh, Anas Mohammad
    Al Rawashdeh, Ahmad
    2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 381 - 388
  • [32] Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce
    Lu, Lu
    Jin, Hai
    Shi, Xuanhua
    Fedak, Gilles
    2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, : 76 - 84
  • [33] Performance Analysis of Graph Based Iterative Algorithms on MapReduce Framework
    Debbarma, Akashdeep
    Annappa, B.
    Mude, Ravi G.
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [34] BitmapAligner: Bit-Parallelism String Matching with MapReduce and Hadoop
    Aksa, Mary
    Rashid, Junaid
    Nisar, Muhammad Wasif
    Mahmood, Toqeer
    Kwon, Hyuk-Yoon
    Hussain, Amir
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (03): : 3931 - 3946
  • [35] Implementation of Page Rank Algorithm in Hadoop MapReduce Framework
    Bhawivuga, Adhitya
    Kirana, Annisa Puspa
    2016 INTERNATIONAL SEMINAR ON INTELLIGENT TECHNOLOGY AND ITS APPLICATIONS (ISITIA): RECENT TRENDS IN INTELLIGENT COMPUTATIONAL TECHNOLOGIES FOR SUSTAINABLE ENERGY, 2016, : 231 - 235
  • [36] HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
    Gandomi, Abolfazl
    Reshadi, Midia
    Movaghar, Ali
    Khademzadeh, Ahmad
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [37] HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
    Abolfazl Gandomi
    Midia Reshadi
    Ali Movaghar
    Ahmad Khademzadeh
    Journal of Big Data, 6
  • [38] Apache Hadoop-MapReduce on YARN framework latency
    El Yazidi, Abdelaziz
    Azizi, Mohamed Saad
    Benlachmi, Yassine
    Hasnaoui, Moulay Lahcen
    12TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 4TH INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2021, 184 : 803 - 808
  • [39] Various approches to improve MapReduce performance in Hadoop
    Manjaly, Jisha S.
    Subbulakshmi, T.
    PROCEEDINGS OF THE 2018 3RD INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2018), 2018, : 778 - 782
  • [40] Performance Modelling and Analysis of MapReduce/Hadoop Workloads
    Yu, Xiaolong
    Li, Wei
    2015 IEEE 21ST INTERNATIONAL WORKSHOP ON LOCAL & METROPOLITAN AREA NETWORKS (LANMAN), 2015,