Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework

被引:0
|
作者
Sona, C. P. [1 ]
Mulerikkal, Jaison Paul [1 ]
机构
[1] Rajagiri Sch Engn & Technol, Sunya Labs, Kochi, Kerala, India
来源
MOBILE NETWORKS AND MANAGEMENT (MONAMI 2017) | 2018年 / 235卷
关键词
Pattern matching; Hadoop; MapReduce; Big Data; Knuth Morris Pratt Algorithm; Boyer Moore Algorithm; Franek Jennings Smyth Algorithm;
D O I
10.1007/978-3-319-90775-8_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Creating meaning out of the growing Big Data is an insurmountable challenge data scientists face and pattern matching algorithms are great means to create such meaning from heaps of data. However, the available pattern matching algorithms are mostly tested with linear programming models whose adaptability and efficiency are not tested in distributed programming models such as Hadoop Map-Reduce, which supports Big Data. This paper explains an experience of parallelizing three of such pattern matching algorithms, namely - Knuth Morris Pratt Algorithm (KMP), Boyer Moore Algorithm (BM) and a lesser known Franek Jennings Smyth (FJS) Algorithm and porting them to Hadoop MapReduce framework. All the three algorithms are converted to MapReduce programs using key value pairs and experimented on single node as well as cluster Hadoop environment. The result analysis with the Project Gutenberg data-set has shown all the three parallel algorithms scale well on Hadoop environment as the data size increases. The experimental results prove that KMP algorithm gives higher performance for shorter patterns over BM, and BM algorithm gives higher performance than KMP for longer patterns. However, FJS algorithm, which is a hybrid of KMP and Boyer horspool algorithm which is advanced version of BM, outperforms both KMP and BM for shorter and longer patterns, and emerges as the most suitable algorithm for pattern matching in a Hadoop environment.
引用
收藏
页码:45 / 55
页数:11
相关论文
共 50 条
  • [21] Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud
    Wang, Nini
    Yang, Jian
    Lu, Zhihui
    Li, Xiaoyan
    Wu, Jie
    ADVANCES IN SERVICES COMPUTING, 2016, 10065 : 77 - 91
  • [22] A Performance Comparison of Apache Tez and MapReduce with Data Compression on Hadoop Cluster
    Rattanaopas, Kritwara
    PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [23] Evaluation of Hadoop/Mapreduce Framework Migration Tools
    Odia, Trust
    Misra, Sanjay
    Adewumi, Adewole
    2014 ASIA-PACIFIC WORLD CONGRESS ON COMPUTER SCIENCE AND ENGINEERING (APWC ON CSE), 2014,
  • [24] Scientific data processing framework for Hadoop MapReduce
    Department of Computer and Information, Xinxiang University, Xinxiang, China
    1600, Journal of Chemical and Pharmaceutical Research, 3/668 Malviya Nagar, Jaipur, Rajasthan, India (06):
  • [25] A Framework for Distributed Pattern Matching Based on Multithreading
    Kofahi, Najib
    Abusalama, Ahmed
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2012, 9 (01) : 30 - 38
  • [26] Straggler Mitigation in Hadoop MapReduce Framework: A Review
    Ajibade, Lukuman Saheed
    Abu Bakar, Kamalrulnizam
    Aliyu, Ahmed
    Danish, Tasneem
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 870 - 878
  • [27] Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster
    Naik, Nenavath Srinivas
    Negi, Atul
    Sastry, V. N.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS, ICACNI 2015, VOL 2, 2016, 44 : 465 - 473
  • [28] A Hadoop MapReduce Performance Prediction Method
    Song, Ge
    Meng, Zide
    Huet, Fabrice
    Magoules, Frederic
    Yu, Lei
    Lin, Xuelian
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 820 - 825
  • [29] Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models
    Ferrucci, Filomena
    Salza, Pasquale
    Sarro, Federica
    EVOLUTIONARY COMPUTATION, 2018, 26 (04) : 535 - 567
  • [30] Hadoop-MapReduce Job Scheduling Algorithms Survey
    Mohamed, Ehab
    Hong, Zheng
    2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2016, : 237 - 242