HaRD: a heterogeneity-aware replica deletion for HDFS

被引:6
|
作者
Ciritoglu, Hilmi Egemen [1 ]
Murphy, John [1 ]
Thorpe, Christina [2 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Performance Engn Lab, Dublin, Ireland
[2] Technol Univ Dublin, Dublin, Ireland
基金
爱尔兰科学基金会;
关键词
Hadoop distributed file system (HDFS); Replication factor; Replica management framework; Software performance; DATA PLACEMENT; HADOOP; HIVE;
D O I
10.1186/s40537-019-0256-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-off between better data availability and higher disk usage. Recent studies propose different data replication management frameworks that alter the replication factor of files dynamically in response to the popularity of the data, keeping more replicas for in-demand data to enhance the overall performance of the system. When data gets less popular, these schemes reduce the replication factor, which changes the data distribution and leads to unbalanced data distribution. Such an unbalanced data distribution causes hot spots, low data locality and excessive network usage in the cluster. In this work, we first confirm that reducing the replication factor causes unbalanced data distribution when using Hadoop's default replica deletion scheme. Then, we show that even keeping a balanced data distribution using WBRD (data-distribution-aware replica deletion scheme) that we proposed in previous work performs sub-optimally on heterogeneous clusters. In order to overcome this issue, we propose a heterogeneity-aware replica deletion scheme (HaRD). HaRD considers the nodes' processing capabilities when deleting replicas; hence it stores more replicas on the more powerful nodes. We implemented HaRD on top of HDFS and conducted a performance evaluation on a 23-node dedicated heterogeneous cluster. Our results show that HaRD reduced execution time by up to 60%, and 17% when compared to Hadoop and WBRD, respectively.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] HaRD: a heterogeneity-aware replica deletion for HDFS
    Hilmi Egemen Ciritoglu
    John Murphy
    Christina Thorpe
    Journal of Big Data, 6
  • [2] Hop: Heterogeneity-aware Decentralized Training
    Luo, Qinyi
    Lin, Jinkun
    Zhuo, Youwei
    Qian, Xuehai
    TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 893 - 907
  • [3] A Heterogeneity-Aware Task Scheduler for Spark
    Xu, Luna
    Butt, Ali R.
    Lim, Seung-Hwan
    Kannan, Ramakrishnan
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 245 - 256
  • [4] Heterogeneity-aware Distributed Parameter Servers
    Jiang, Jiawei
    Cui, Bin
    Zhang, Ce
    Yu, Lele
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 463 - 478
  • [5] Heterogeneity-aware fair federated learning
    Li, Xiaoli
    Zhao, Siran
    Chen, Chuan
    Zheng, Zibin
    INFORMATION SCIENCES, 2023, 619 : 968 - 986
  • [6] HALO: Heterogeneity-Aware Load Balancing
    Gandhi, Anshul
    Zhang, Xi
    Mittal, Naman
    2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2015), 2015, : 242 - 251
  • [7] Heterogeneity-aware distributed access structure
    Beltrán, AG
    Milligan, P
    Sage, P
    FIFTH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING, PROCEEDINGS, 2005, : 152 - 153
  • [8] A Study of Effective Replica Reconstruction Schemes at Node Deletion for HDFS
    Higai, Asami
    Takefusa, Atsuko
    Nakada, Hidemoto
    Oguchi, Masato
    2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 512 - 521
  • [9] Heterogeneity-Aware Data Placement in Hybrid Clouds
    Marquez, Jack D.
    Gonzalez, Juan D.
    Mondragon, Oscar H.
    CLOUD COMPUTING - CLOUD 2019, 2019, 11513 : 177 - 191
  • [10] FLASH: Heterogeneity-Aware Federated Learning at Scale
    Yang, Chengxu
    Xu, Mengwei
    Wang, Qipeng
    Chen, Zhenpeng
    Huang, Kang
    Ma, Yun
    Bian, Kaigui
    Huang, Gang
    Liu, Yunxin
    Jin, Xin
    Liu, Xuanzhe
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (01) : 483 - 500