HaRD: a heterogeneity-aware replica deletion for HDFS

被引:6
|
作者
Ciritoglu, Hilmi Egemen [1 ]
Murphy, John [1 ]
Thorpe, Christina [2 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Performance Engn Lab, Dublin, Ireland
[2] Technol Univ Dublin, Dublin, Ireland
基金
爱尔兰科学基金会;
关键词
Hadoop distributed file system (HDFS); Replication factor; Replica management framework; Software performance; DATA PLACEMENT; HADOOP; HIVE;
D O I
10.1186/s40537-019-0256-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-off between better data availability and higher disk usage. Recent studies propose different data replication management frameworks that alter the replication factor of files dynamically in response to the popularity of the data, keeping more replicas for in-demand data to enhance the overall performance of the system. When data gets less popular, these schemes reduce the replication factor, which changes the data distribution and leads to unbalanced data distribution. Such an unbalanced data distribution causes hot spots, low data locality and excessive network usage in the cluster. In this work, we first confirm that reducing the replication factor causes unbalanced data distribution when using Hadoop's default replica deletion scheme. Then, we show that even keeping a balanced data distribution using WBRD (data-distribution-aware replica deletion scheme) that we proposed in previous work performs sub-optimally on heterogeneous clusters. In order to overcome this issue, we propose a heterogeneity-aware replica deletion scheme (HaRD). HaRD considers the nodes' processing capabilities when deleting replicas; hence it stores more replicas on the more powerful nodes. We implemented HaRD on top of HDFS and conducted a performance evaluation on a 23-node dedicated heterogeneous cluster. Our results show that HaRD reduced execution time by up to 60%, and 17% when compared to Hadoop and WBRD, respectively.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore Processors
    Yu, Teng
    Zhong, Runxin
    Janjic, Vladimir
    Petoumenos, Pavlos
    Zhai, Jidong
    Leather, Hugh
    Thomson, John
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (05) : 1224 - 1237
  • [32] On the impact of job size variability on heterogeneity-aware load balancing
    Ignace Van Spilbeeck
    Benny Van Houdt
    Annals of Operations Research, 2020, 293 : 371 - 399
  • [33] HSEP: Heterogeneity-aware Hierarchical Stable Election Protocol for WSNs
    Khan, A. A.
    Javaid, N.
    Qasim, U.
    Lu, Z.
    Khan, Z. A.
    2012 SEVENTH INTERNATIONAL CONFERENCE ON BROADBAND, WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2012), 2012, : 373 - 378
  • [34] Heterogeneity-Aware Codes With Uncoded Repair for Distributed Storage Systems
    Zhu, Bing
    Shum, Kenneth W.
    Li, Hui
    IEEE COMMUNICATIONS LETTERS, 2015, 19 (06) : 901 - 904
  • [35] HARMONY: Heterogeneity-Aware Hierarchical Management for Federated Learning System
    Tian, Chunlin
    Li, Li
    Shi, Zhan
    Wang, Jun
    Xu, ChengZhong
    2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2022, : 631 - 645
  • [36] Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
    Narayanan, Deepak
    Santhanam, Keshav
    Kazhamiaka, Fiodar
    Phanishayee, Amar
    Zaharia, Matei
    PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), 2020, : 481 - 498
  • [37] Federated Learning With Heterogeneity-Aware Probabilistic Synchronous Parallel on Edge
    Zhao, Jianxin
    Han, Rui
    Yang, Yongkai
    Catterall, Benjamin
    Liu, Chi Harold
    Chen, Lydia Y.
    Mortier, Richard
    Crowcroft, Jon
    Wang, Liang
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (02) : 614 - 626
  • [38] Heterogeneity-aware elastic scaling of streaming applications on cloud platforms
    Sahni, Jyoti
    Vidyarthi, Deo Prakash
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (09): : 10512 - 10539
  • [39] HAShCache: Heterogeneity-Aware Shared DRAMCache for Integrated Heterogeneous Systems
    Patil, Adarsh
    Govindarajan, Ramaswamy
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (04)
  • [40] Heterogeneity-Aware Operator Placement in Column-Store DBMS
    Karnagel, Tomas
    Habich, Dirk
    Schlegel, Benjamin
    Lehner, Wolfgang
    Datenbank-Spektrum, 2014, 14 (03) : 211 - 221