Improving the Shuffle of Hadoop MapReduce

被引:8
|
作者
Li, Jingui [1 ]
Lin, Xuelian [1 ]
Cui, Xiaolong [1 ]
Ye, Yue [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
关键词
hadoop; mapreduce; shuffle;
D O I
10.1109/CloudCom.2013.42
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an efficient parallel computing system based on Map Reduce model, Hadoop is widely used for large-scale data analysis such as data mining, machine learning and scientific simulation. However, there are still some performance problems in Map Reduce, especially the situation in the shuffle phase. In order to solve these problems, in this paper, a lightweight individual shuffle service component with more efficient I/O policy was proposed rather than the existing shuffle phase in Map Reduce. We also describe how to implement the shuffle service in three steps: extract shuffle from reduce task as a shuffle task, reconstruct the shuffle task as a service and improve 110 scheduling policy on Map sides. Furthermore both simulated experiments and Map Reduce job comparative studies are conducted to evaluate the performance of our improvements. The result reveals that our approach can decrease the whole job's execution time and make full use of cluster resources.
引用
收藏
页码:266 / 273
页数:8
相关论文
共 50 条
  • [1] Improving the Map and Shuffle Phases in Hadoop MapReduce
    Lakshmi, J. V. N.
    SMART COMPUTING AND INFORMATICS, 2018, 77 : 203 - 212
  • [2] Phase-Reconfigurable Shuffle Optimization for Hadoop MapReduce
    Wang, Jihe
    Qiu, Meikang
    Guo, Bing
    Zong, Ziliang
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (02) : 418 - 431
  • [3] Hadoop MapReduce与Spark 的Shuffle过程原理
    胡必波
    彭平
    李散散
    信息技术与信息化, 2021, (05) : 63 - 66
  • [4] Improving the efficiency of MapReduce scheduling algorithm in Hadoop
    Thangaselvi, R.
    Ananthbabu, S.
    Jagadeesh, S.
    Aruna, R.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 63 - 68
  • [5] iShuffle: Improving Hadoop Performance with Shuffle-on-Write
    Guo, Yanfei
    Rao, Jia
    Cheng, Dazhao
    Zhou, Xiaobo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1649 - 1662
  • [6] Task failure resilience technique for improving the performance of MapReduce in Hadoop
    Kavitha, C.
    Anita, X.
    ETRI JOURNAL, 2020, 42 (05) : 751 - 763
  • [7] Similarity-based Node Distance Exploring and Locality-aware Shuffle Optimization for Hadoop MapReduce
    Wang, Jihe
    Wang, Danghui
    Zhang, Meng
    Qiu, Meikang
    Guo, Bing
    2017 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2017, : 103 - 108
  • [8] Improving Hadoop MapReduce performance on heterogeneous single board computer clusters☆
    Lim, Sooyoung
    Park, Dongchul
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 752 - 766
  • [9] MapReduce Shuffle性能改进
    熊倩
    张䶮
    郭明
    徐婕
    计算机应用, 2017, 37(S1) (S1) : 58 - 62+67
  • [10] Improving Hadoop MapReduce Performance with Data Compression: A Study using Wordcount Job
    Rattanaopas, Kritwara
    Kaewkeeree, Sureerat
    2017 14TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2017, : 564 - 567