Task failure resilience technique for improving the performance of MapReduce in Hadoop

被引:10
|
作者
Kavitha, C. [1 ]
Anita, X. [2 ]
机构
[1] Anna Univ, Dept Informat & Commun Engn, Chennai, Tamil Nadu, India
[2] Jerusalem Coll Engn, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
Hadoop; in-memory; key-value pair; MapReduce; recovery; Redis cache; resilience; task failure;
D O I
10.4218/etrij.2018-0265
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
MapReduce is a framework that can process huge datasets in parallel and distributed computing environments. However, a single machine failure during the runtime of MapReduce tasks can increase completion time by 50%. MapReduce handles task failures by restarting the failed task and re-computing all input data from scratch, regardless of how much data had already been processed. To solve this issue, we need the computed key-value pairs to persist in a storage system to avoid re-computing them during the restarting process. In this paper, the task failure resilience (TFR) technique is proposed, which allows the execution of a failed task to continue from the point it was interrupted without having to redo all the work. Amazon ElastiCache for Redis is used as a non-volatile cache for the key-value pairs. We measured the performance of TFR by running different Hadoop benchmarking suites. TFR was implemented using the Hadoop software framework, and the experimental results showed significant performance improvements when compared with the performance of the default Hadoop implementation.
引用
收藏
页码:751 / 763
页数:13
相关论文
共 50 条
  • [41] Performance Analysis of Hadoop MapReduce on an OpenNebula Cloud with KVM and OpenVZ Virtualizations
    Magalhaes Vasconcelos, Pedro Roger
    de Araujo Freitas, Gisele Azevedo
    2014 9TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2014, : 471 - 476
  • [42] An Open Source Project for Tuning and Analyzing MapReduce Performance in Hadoop and Spark
    Chen, Donghua
    Zhang, Runtong
    IEEE SOFTWARE, 2022, 39 (01) : 61 - 69
  • [43] Performance Evaluation and Tuning for MapReduce Computing in Hadoop Distributed File System
    Kim, Jongyeop
    Kumar, Ashwin T. K.
    George, K. M.
    Park, Nohpill
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 62 - 68
  • [44] Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster
    Singh, Sudhakar
    Garg, Rakhi
    Mishra, P. K.
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 87 - 94
  • [45] Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster
    Singh, Sudhakar
    Garg, Rakhi
    Mishra, P. K.
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 67 : 348 - 364
  • [46] Model Driven Performance Simulation of Cloud Provisioned Hadoop MapReduce Applications
    Alipour, Hanieh
    Liu, Yan
    Hamou-Lhadj, Abdelwahab
    Gorton, Ian
    2016 IEEE/ACM 8TH INTERNATIONAL WORKSHOP ON MODELING IN SOFTWARE ENGINEERING (MISE), 2016, : 48 - 54
  • [47] A Performance Comparison of Apache Tez and MapReduce with Data Compression on Hadoop Cluster
    Rattanaopas, Kritwara
    PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [48] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Moon, Sangwhan
    Lee, Jaehwan
    Sun, Xiling
    Kee, Yang-suk
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (09): : 3525 - 3548
  • [49] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Sangwhan Moon
    Jaehwan Lee
    Xiling Sun
    Yang-suk Kee
    The Journal of Supercomputing, 2015, 71 : 3525 - 3548
  • [50] Noninvasive MapReduce Performance Tuning Using Multiple Tuning Methods on Hadoop
    Chen, Donghua
    Zhang, Runtong
    Qiu, Robin Guanghua
    IEEE SYSTEMS JOURNAL, 2021, 15 (02): : 2906 - 2917