Replica-aware data recovery performance improvement for Hadoop system with NVM

被引:0
|
作者
Li, Xin [1 ]
Li, Huijie [1 ]
Lu, Youyou [2 ]
Zhao, Yanchao [1 ]
Qin, Xiaolin [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Data recovery; HDFS; MapReduce; Non-volatile memory; Performance tuning; CLUSTER; MEMORY;
D O I
10.1007/s42514-021-00066-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The non-volatile memory (NVM) is the promising device to store data and accelerate big data analysis due to its excellent I/O performance. However, we find that simply replacing hard disk drive (HDD) with NVM cannot bring the expected performance improvement. In this paper, we take the data recovery issue in Hadoop file system (HDFS) as a case study to investigate how to take advantage of the performance of NVM. We analyze the data recovery mechanism in HDFS and find that the configuration of replication tasks in the DataNode can affect the data recovery significantly. We conduct extensive analysis and experiments tuning the configuration and also get some interesting findings. With the new configuration, we increase the data recovery performance from 17 to 71%. We can also improve the execution performance of MapReduce jobs from 28 to 59% through optimized configuration. We also find that the sudden data recovery brings disordered network resource competition, which reduces the performance of MapReduce jobs. Hence, We present a priority-aware multi-stage data recovery method. This improves the performance by 32.5% in addition for the MapReduce jobs.
引用
收藏
页码:144 / 156
页数:13
相关论文
共 50 条
  • [1] Replica-aware data recovery performance improvement for Hadoop system with NVM
    Xin Li
    Huijie Li
    Youyou Lu
    Yanchao Zhao
    Xiaolin Qin
    CCF Transactions on High Performance Computing, 2021, 3 : 144 - 156
  • [2] An Experimental Study on Data Recovery Performance Improvement for HDFS with NVM
    Li, Huijie
    Li, Xin
    Lu, Youyou
    Qin, Xiaolin
    2020 29TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2020), 2020,
  • [3] System Status Aware Hadoop Scheduling Methods for Job Performance Improvement
    Kawarasaki, Masatoshi
    Watanabe, Hyuma
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (07): : 1275 - 1285
  • [4] A Self-aware Data Compression System on FPGA in Hadoop
    Li, Yubin
    Sun, Yuliang
    Dai, Guohao
    Wang, Yuzhi
    Ni, Jiacai
    Wang, Yu
    Li, Guoliang
    Yang, Huazhong
    2015 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (FPT), 2015, : 196 - 199
  • [5] Multi-file Queries Performance Improvement through Data Placement in Hadoop
    Tang, Yu
    Abdulhay, Elham
    Fan, Aihua
    Su, Sheng
    Gebreselassie, Kidus
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 986 - 991
  • [6] Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy
    J. V. Bibal Benifa
    Wireless Personal Communications, 2017, 95 : 2709 - 2733
  • [7] Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy
    Benifa, J. V. Bibal
    Dejey
    WIRELESS PERSONAL COMMUNICATIONS, 2017, 95 (03) : 2709 - 2733
  • [8] IDaPS - Improved data-locality aware data placement strategy based on Markov clustering to enhance MapReduce performance on Hadoop
    Vengadeswaran, S.
    Balasundaram, S. R.
    Dhavakumar, P.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (03)
  • [9] The Solution for Performance Improvement of Electic Distribution Network Line Loss Based on Hadoop Big Data Technology
    Sun, Lihua
    Hu, Mu
    Meng, Qingqiang
    Lin, Feng
    Qian, Yakang
    She, Yunbo
    Ma, Zheng
    Pei, Xuan
    Song, Shuting
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 500 - 506
  • [10] Big Data Performance Analysis on a Hadoop Distributed File System Based on Geometric Data Perturbation Technique
    Marichamy, V. Santhana
    Natarajan, V.
    2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 415 - 420