FullRepair: Towards Optimal Repair Pipelining in Erasure-Coded Clustered Storage Systems

被引:0
|
作者
Zhang, Yuzuo [1 ]
Tu, Xinyuan [1 ]
Wang, Lin [1 ]
Hu, Yuchong [1 ]
Wang, Fang [1 ]
Wang, Ye [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER | 2023年
基金
中国国家自然科学基金;
关键词
distributed storage; erasure coding; parallelism; data reliability; recovery;
D O I
10.1109/CLUSTER52292.2023.00017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustered storage systems often deploy erasure coding that encodes data into coded chunks and distributes them across nodes to tolerate node failures. It is a storage-efficient redundancy scheme but incurs high repair penalty; thus some state-of-the-arts aim to pipeline the above repair process to improve the repair performance. However, we observe that all existing repair pipelining methods only use a single pipeline, making network bandwidth resources of storage nodes underutilized. In this paper, we propose FullRepair, a new repair pipelining mechanism based on multiple pipelines with the aim of fully exploiting all available bandwidth resources during repair. We construct four constraints to model the repair pipelining problem such that we can obtain the optimal pipelined repair throughput under full bandwidth utilization. We design a multi-pipeline scheduling scheme for FullRepair so as to achieve the above optimality. Experiments on the Amazon EC2 show that compared with the state-of-the-art repair pipelining methods RP and PivotRepair, FullRepair reduces the repair time of single chunk by up to 45.40% and 33.19%, respectively.
引用
收藏
页码:107 / 117
页数:11
相关论文
共 50 条
  • [21] Optimistic Erasure-Coded Distributed Storage
    Dutta, Partha
    Guerraoui, Rachid
    Levy, Ron R.
    DISTRIBUTED COMPUTING, PROCEEDINGS, 2008, 5218 : 182 - +
  • [22] Optimized Proactive Recovery in Erasure-Coded Cloud Storage Systems
    Nachiappan, Rekha
    Calheiros, Rodrigo N.
    Matawie, Kenan M.
    Javadi, Bahman
    IEEE ACCESS, 2023, 11 : 38226 - 38239
  • [23] Taming Tail Latency for Erasure-coded, Distributed Storage Systems
    Aggarwal, Vaneet
    Fan, Jingxian
    Lan, Tian
    IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [24] ClusterSR: Cluster-Aware Scattered Repair in Erasure-Coded Storage
    Shen, Zhirong
    Shu, Jiwu
    Huang, Zhijie
    Fu, Yingxun
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 42 - 51
  • [25] Reliability Evaluation of Erasure-coded Storage Systems with Latent Errors
    Iliadis, Ilias
    ACM TRANSACTIONS ON STORAGE, 2023, 19 (01)
  • [26] Boosting Degraded Reads in Heterogeneous Erasure-Coded Storage Systems
    Zhu, Yunfeng
    Lin, Jian
    Lee, Patrick P. C.
    Xu, Yinlong
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (08) : 2145 - 2157
  • [27] Mean Latency Optimization in Erasure-coded Distributed Storage Systems
    Al-Abbasi, Abubakr O.
    Aggarwal, Vaneet
    IEEE INFOCOM 2018 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2018, : 432 - 437
  • [28] An Efficient Parallel Coding Scheme in Erasure-Coded Storage Systems
    Dong, Wenrui
    Liu, Guangming
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (03): : 627 - 643
  • [29] PDL: A Data Layout towards Fast Failure Recovery for Erasure-coded Distributed Storage Systems
    Xu, Liangliang
    Lv, Min
    Li, Zhipeng
    Li, Cheng
    Xu, Yinlong
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 736 - 745
  • [30] Erasure-Coded Byzantine Storage with Separate Metadata
    Androulaki, Elli
    Cachin, Christian
    Dobre, Dan
    Vukolic, Marko
    PRINCIPLES OF DISTRIBUTED SYSTEMS, OPODIS 2014, 2014, 8878 : 76 - 90