Task failure resilience technique for improving the performance of MapReduce in Hadoop

被引:10
|
作者
Kavitha, C. [1 ]
Anita, X. [2 ]
机构
[1] Anna Univ, Dept Informat & Commun Engn, Chennai, Tamil Nadu, India
[2] Jerusalem Coll Engn, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
Hadoop; in-memory; key-value pair; MapReduce; recovery; Redis cache; resilience; task failure;
D O I
10.4218/etrij.2018-0265
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
MapReduce is a framework that can process huge datasets in parallel and distributed computing environments. However, a single machine failure during the runtime of MapReduce tasks can increase completion time by 50%. MapReduce handles task failures by restarting the failed task and re-computing all input data from scratch, regardless of how much data had already been processed. To solve this issue, we need the computed key-value pairs to persist in a storage system to avoid re-computing them during the restarting process. In this paper, the task failure resilience (TFR) technique is proposed, which allows the execution of a failed task to continue from the point it was interrupted without having to redo all the work. Amazon ElastiCache for Redis is used as a non-volatile cache for the key-value pairs. We measured the performance of TFR by running different Hadoop benchmarking suites. TFR was implemented using the Hadoop software framework, and the experimental results showed significant performance improvements when compared with the performance of the default Hadoop implementation.
引用
收藏
页码:751 / 763
页数:13
相关论文
共 50 条
  • [21] Mapreduce performance model for Hadoop 2.x
    Glushkova, Dada
    Jovanovic, Petar
    Abello, Alberto
    INFORMATION SYSTEMS, 2019, 79 : 32 - 43
  • [22] Performance Modeling for RDMA-Enhanced Hadoop MapReduce
    Wasi-ur-Rahman, Md.
    Lu, Xiaoyi
    Islam, Nusrat Sharmin
    Panda, Dhabaleswar K.
    2014 43RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2014, : 50 - 59
  • [23] Performance Enhancement of Hadoop MapReduce Framework for Analyzing BigData
    Prabhu, Swathi
    Rodrigues, Anisha P.
    Prasad, Guru M. S.
    Nagesh, H. R.
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [24] Performance Optimization for Short MapReduce Job Execution in Hadoop
    Yan, Jinshuang
    Yang, Xiaoliang
    Gu, Rong
    Yuan, Chunfeng
    Huang, Yihua
    SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 688 - 694
  • [25] An Enhanced Hadoop Heartbeat Mechanism for MapReduce Task Scheduler Using Dynamic Calibration
    Lu, Xinzhu
    Phang, Keatkeong
    CHINA COMMUNICATIONS, 2018, 15 (11) : 93 - 110
  • [26] An Enhanced Hadoop Heartbeat Mechanism for MapReduce Task Scheduler Using Dynamic Calibration
    Xinzhu Lu
    Keatkeong Phang
    中国通信, 2018, 15 (11) : 93 - 110
  • [27] A Throughput Driven Task Scheduler for Improving MapReduce Performance in Job-intensive Environments
    Wang, Xite
    Shen, Derong
    Yu, Ge
    Nie, Tiezheng
    Kou, Yue
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 211 - 218
  • [28] An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data
    Chandra, Subhash
    Motwani, Deepak
    2016 INTERNATIONAL CONFERENCE ON MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING (ICMETE), 2016, : 178 - 182
  • [29] Memory and Performance Aware Scheduling Design for Hadoop MapReduce Framework
    Bakka, Jagadevi
    Lingareddy, Sanjeev C.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (13): : 242 - 246
  • [30] Performance Control for Nonlinear Hadoop-Mapreduce Computing Systems
    Lei, Jing
    Song, Jia-Qing
    INTEGRATED FERROELECTRICS, 2023, 233 (01) : 148 - 159