IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop

被引:4
|
作者
Kavitha, C. [1 ]
Srividhya, S. R. [1 ]
Lai, Wen-Cheng [2 ,3 ]
Mani, Vinodhini [1 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Comp Sci & Engn, Chennai 600119, Tamil Nadu, India
[2] Natl Yunlin Univ Sci & Technol, Bachelor Program Ind Projects, Touliu 640301, Yunlin, Taiwan
[3] Natl Yunlin Univ Sci & Technol, Dept Elect Engn, Touliu 640301, Yunlin, Taiwan
关键词
big data; combiner; distributed storage; hadoop; mapreduce; sort; task failure resilience; wordcount;
D O I
10.3390/electronics11101599
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop is a framework for storing and processing huge amounts of data. With HDFS, large data sets can be managed on commodity hardware. MapReduce is a programming model for processing vast amounts of data in parallel. Mapping and reducing can be performed by using the MapReduce programming framework. A very large amount of data is transferred from Mapper to Reducer without any filtering or recursion, resulting in overdrawn bandwidth. In this paper, we introduce an algorithm called Inner MAPping Combiner (IMapC) for the map phase. This algorithm in the Mapper combines the values of recurring keys. In order to test the efficiency of the algorithm, different approaches were tested. According to the test, MapReduce programs that are implemented with the Default Combiner (DC) of IMapC will be 70% more efficient than those that are implemented without one. To make computations significantly faster, this work can be combined with MapReduce.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster
    Singh, Sudhakar
    Garg, Rakhi
    Mishra, P. K.
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 67 : 348 - 364
  • [32] Model Driven Performance Simulation of Cloud Provisioned Hadoop MapReduce Applications
    Alipour, Hanieh
    Liu, Yan
    Hamou-Lhadj, Abdelwahab
    Gorton, Ian
    2016 IEEE/ACM 8TH INTERNATIONAL WORKSHOP ON MODELING IN SOFTWARE ENGINEERING (MISE), 2016, : 48 - 54
  • [33] A Performance Comparison of Apache Tez and MapReduce with Data Compression on Hadoop Cluster
    Rattanaopas, Kritwara
    PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [34] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Moon, Sangwhan
    Lee, Jaehwan
    Sun, Xiling
    Kee, Yang-suk
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (09): : 3525 - 3548
  • [35] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Sangwhan Moon
    Jaehwan Lee
    Xiling Sun
    Yang-suk Kee
    The Journal of Supercomputing, 2015, 71 : 3525 - 3548
  • [36] Noninvasive MapReduce Performance Tuning Using Multiple Tuning Methods on Hadoop
    Chen, Donghua
    Zhang, Runtong
    Qiu, Robin Guanghua
    IEEE SYSTEMS JOURNAL, 2021, 15 (02): : 2906 - 2917
  • [37] Improving Hadoop MapReduce performance on heterogeneous single board computer clusters☆
    Lim, Sooyoung
    Park, Dongchul
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 752 - 766
  • [38] High-Responsive Scheduling with MapReduce Performance Prediction on Hadoop YARN
    Liu, Yang
    Zeng, Yukun
    Piao, Xuefeng
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA), 2016, : 238 - 247
  • [39] Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework
    Sona, C. P.
    Mulerikkal, Jaison Paul
    MOBILE NETWORKS AND MANAGEMENT (MONAMI 2017), 2018, 235 : 45 - 55
  • [40] ACO-HCO: Heuristic Performance Tuning Scheme for the Hadoop MapReduce Architecture
    Liu, Chiang-Lung
    Lo, Hsiang-Fu
    Lee, Wei-Tsong
    JOURNAL OF INTERNET TECHNOLOGY, 2020, 21 (04): : 1151 - 1159