HaRD: a heterogeneity-aware replica deletion for HDFS

被引:6
|
作者
Ciritoglu, Hilmi Egemen [1 ]
Murphy, John [1 ]
Thorpe, Christina [2 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Performance Engn Lab, Dublin, Ireland
[2] Technol Univ Dublin, Dublin, Ireland
基金
爱尔兰科学基金会;
关键词
Hadoop distributed file system (HDFS); Replication factor; Replica management framework; Software performance; DATA PLACEMENT; HADOOP; HIVE;
D O I
10.1186/s40537-019-0256-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-off between better data availability and higher disk usage. Recent studies propose different data replication management frameworks that alter the replication factor of files dynamically in response to the popularity of the data, keeping more replicas for in-demand data to enhance the overall performance of the system. When data gets less popular, these schemes reduce the replication factor, which changes the data distribution and leads to unbalanced data distribution. Such an unbalanced data distribution causes hot spots, low data locality and excessive network usage in the cluster. In this work, we first confirm that reducing the replication factor causes unbalanced data distribution when using Hadoop's default replica deletion scheme. Then, we show that even keeping a balanced data distribution using WBRD (data-distribution-aware replica deletion scheme) that we proposed in previous work performs sub-optimally on heterogeneous clusters. In order to overcome this issue, we propose a heterogeneity-aware replica deletion scheme (HaRD). HaRD considers the nodes' processing capabilities when deleting replicas; hence it stores more replicas on the more powerful nodes. We implemented HaRD on top of HDFS and conducted a performance evaluation on a 23-node dedicated heterogeneous cluster. Our results show that HaRD reduced execution time by up to 60%, and 17% when compared to Hadoop and WBRD, respectively.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Heterogeneity-Aware Proactive Elastic Resource Allocation for Serverless Applications
    Feng, Binbin
    Ding, Zhijun
    Jiang, Changjun
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (05) : 2473 - 2487
  • [42] Heterogeneity-aware Workload Placement and Migration in Distributed Sustainable Datacenters
    Cheng, Dazhao
    Jiang, Changjun
    Zhou, Xiaobo
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [43] Kernelized Heterogeneity-Aware Cross-View Face Recognition
    Dhamecha, Tejas, I
    Ghosh, Soumyadeep
    Vatsa, Mayank
    Singh, Richa
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [44] Heterogeneity-Aware Graph Partitioning for Distributed Deployment of Multiagent Systems
    Davoodi, Mohammadreza
    Velni, Javad Mohammadpour
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) : 2578 - 2588
  • [45] Resource and Heterogeneity-aware Clients Eligibility Protocol in Federated Learning
    Asad, Muhammad
    Otoum, Safa
    Shaukat, Saima
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 1140 - 1145
  • [46] Helios: Heterogeneity-Aware Federated Learning with Dynamically Balanced Collaboration
    Xu, Zirui
    Yu, Fuxun
    Xiong, Jinjun
    Chen, Xiang
    2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 997 - 1002
  • [47] Heterogeneity-aware and communication-efficient distributed statistical inference
    Duan, Rui
    Ning, Yang
    Chen, Yong
    BIOMETRIKA, 2022, 109 (01) : 67 - 83
  • [48] AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning
    Kim, Young Geun
    Wu, Carole-Jean
    PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 183 - 198
  • [49] Heterogeneity-Aware Local Binary Patterns for Retrieval of Histopathologh Images
    Erfankhah, Hamed
    Yazdi, Mehran
    Babaie, Morteza
    Tizhoosh, Hamid R.
    IEEE ACCESS, 2019, 7 : 18354 - 18367
  • [50] On the impact of job size variability on heterogeneity-aware load balancing
    Van Spilbeeck, Ignace
    Van Houdt, Benny
    ANNALS OF OPERATIONS RESEARCH, 2020, 293 (01) : 371 - 399