IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop

被引:4
|
作者
Kavitha, C. [1 ]
Srividhya, S. R. [1 ]
Lai, Wen-Cheng [2 ,3 ]
Mani, Vinodhini [1 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Comp Sci & Engn, Chennai 600119, Tamil Nadu, India
[2] Natl Yunlin Univ Sci & Technol, Bachelor Program Ind Projects, Touliu 640301, Yunlin, Taiwan
[3] Natl Yunlin Univ Sci & Technol, Dept Elect Engn, Touliu 640301, Yunlin, Taiwan
关键词
big data; combiner; distributed storage; hadoop; mapreduce; sort; task failure resilience; wordcount;
D O I
10.3390/electronics11101599
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop is a framework for storing and processing huge amounts of data. With HDFS, large data sets can be managed on commodity hardware. MapReduce is a programming model for processing vast amounts of data in parallel. Mapping and reducing can be performed by using the MapReduce programming framework. A very large amount of data is transferred from Mapper to Reducer without any filtering or recursion, resulting in overdrawn bandwidth. In this paper, we introduce an algorithm called Inner MAPping Combiner (IMapC) for the map phase. This algorithm in the Mapper combines the values of recurring keys. In order to test the efficiency of the algorithm, different approaches were tested. According to the test, MapReduce programs that are implemented with the Default Combiner (DC) of IMapC will be 70% more efficient than those that are implemented without one. To make computations significantly faster, this work can be combined with MapReduce.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Improving Hadoop MapReduce Performance with Data Compression: A Study using Wordcount Job
    Rattanaopas, Kritwara
    Kaewkeeree, Sureerat
    2017 14TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2017, : 564 - 567
  • [42] SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
    Gu, Rong
    Yang, Xiaoliang
    Yan, Jinshuang
    Sun, Yuanhao
    Wang, Bing
    Yuan, Chunfeng
    Huang, Yihua
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (03) : 2166 - 2179
  • [43] Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud
    Singh R.
    Kaur P.J.
    Journal of Big Data, 3 (1)
  • [44] Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster
    Naik, Nenavath Srinivas
    Negi, Atul
    Sastry, V. N.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS, ICACNI 2015, VOL 2, 2016, 44 : 465 - 473
  • [45] Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
    Aji, Ablimit
    Wang, Fusheng
    Vo, Hoang
    Lee, Rubao
    Liu, Qiaoling
    Zhang, Xiaodong
    Saltz, Joel
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1009 - 1020
  • [46] Improving Content Based Video Retrieval Performance by Using Hadoop-MapReduce Model
    Saoudi, El Mehdi
    El Ouadrhiri, Ahderrahmane Adoui
    El Warrak, Othman
    Andaloussi, Said Jai
    Sekkaki, Abderrahim
    PROCEEDINGS OF THE 2018 23RD CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2018, : 329 - 334
  • [47] A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
    Pal, Amrit
    Agrawal, Pinki
    Jain, Kunal
    Agrawal, Sanjay
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 587 - 591
  • [48] The Performance Evaluation of K-means by Two MapReduce Frameworks, Hadoop vs. Twister
    Kang, Yunhee
    Park, Young B.
    2015 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN), 2015, : 405 - 406
  • [49] A data locality based scheduler to enhance MapReduce performance in heterogeneous environments
    Naik, Nenavath Srinivas
    Negi, Atul
    Bapu, Tapas B. R.
    Anitha, R.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 423 - 434
  • [50] Performance Evaluation of a MapReduce Hadoop-based Implementation for Processing Large Virtual Campus Log Files
    Xhafa, Fatos
    Garcia, Daniel
    Ramirez, Daniel
    Caballe, Santi
    2015 10TH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2015, : 200 - 206