IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop

被引：4

作者：

Kavitha, C. ^{[1
]}

Srividhya, S. R. ^{[1
]}

Lai, Wen-Cheng ^{[2
,3
]}

Mani, Vinodhini ^{[1
]}

机构：

[1] Sathyabama Inst Sci & Technol, Dept Comp Sci & Engn, Chennai 600119, Tamil Nadu, India

[2] Natl Yunlin Univ Sci & Technol, Bachelor Program Ind Projects, Touliu 640301, Yunlin, Taiwan

[3] Natl Yunlin Univ Sci & Technol, Dept Elect Engn, Touliu 640301, Yunlin, Taiwan

来源：

ELECTRONICS | 2022年 / 11卷 / 10期

关键词：

big data; combiner; distributed storage; hadoop; mapreduce; sort; task failure resilience; wordcount;

D O I：

10.3390/electronics11101599

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hadoop is a framework for storing and processing huge amounts of data. With HDFS, large data sets can be managed on commodity hardware. MapReduce is a programming model for processing vast amounts of data in parallel. Mapping and reducing can be performed by using the MapReduce programming framework. A very large amount of data is transferred from Mapper to Reducer without any filtering or recursion, resulting in overdrawn bandwidth. In this paper, we introduce an algorithm called Inner MAPping Combiner (IMapC) for the map phase. This algorithm in the Mapper combines the values of recurring keys. In order to test the efficiency of the algorithm, different approaches were tested. According to the test, MapReduce programs that are implemented with the Default Combiner (DC) of IMapC will be 70% more efficient than those that are implemented without one. To make computations significantly faster, this work can be combined with MapReduce.

引用

页数：16

共 50 条

[31] Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster
Singh, Sudhakar
Garg, Rakhi
Mishra, P. K.
COMPUTERS & ELECTRICAL ENGINEERING, 2018, 67 : 348 - 364
[32] Model Driven Performance Simulation of Cloud Provisioned Hadoop MapReduce Applications
Alipour, Hanieh
Liu, Yan
Hamou-Lhadj, Abdelwahab
Gorton, Ian
2016 IEEE/ACM 8TH INTERNATIONAL WORKSHOP ON MODELING IN SOFTWARE ENGINEERING (MISE), 2016, : 48 - 54
[33] A Performance Comparison of Apache Tez and MapReduce with Data Compression on Hadoop Cluster
Rattanaopas, Kritwara
PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
[34] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
Moon, Sangwhan
Lee, Jaehwan
Sun, Xiling
Kee, Yang-suk
JOURNAL OF SUPERCOMPUTING, 2015, 71 (09): : 3525 - 3548
[35] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
Sangwhan Moon
Jaehwan Lee
Xiling Sun
Yang-suk Kee
The Journal of Supercomputing, 2015, 71 : 3525 - 3548
[36] Noninvasive MapReduce Performance Tuning Using Multiple Tuning Methods on Hadoop
Chen, Donghua
Zhang, Runtong
Qiu, Robin Guanghua
IEEE SYSTEMS JOURNAL, 2021, 15 (02): : 2906 - 2917
[37] Improving Hadoop MapReduce performance on heterogeneous single board computer clusters☆
Lim, Sooyoung
Park, Dongchul
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 752 - 766
[38] High-Responsive Scheduling with MapReduce Performance Prediction on Hadoop YARN
Liu, Yang
Zeng, Yukun
Piao, Xuefeng
2016 IEEE 22ND INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA), 2016, : 238 - 247
[39] Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework
Sona, C. P.
Mulerikkal, Jaison Paul
MOBILE NETWORKS AND MANAGEMENT (MONAMI 2017), 2018, 235 : 45 - 55
[40] ACO-HCO: Heuristic Performance Tuning Scheme for the Hadoop MapReduce Architecture
Liu, Chiang-Lung
Lo, Hsiang-Fu
Lee, Wei-Tsong
JOURNAL OF INTERNET TECHNOLOGY, 2020, 21 (04): : 1151 - 1159

← 1 2 3 4 5 →