A comprehensive repair scheme for distributed storage systems

被引:1
|
作者
Chen, Junmei [1 ]
Li, Zongpeng [1 ,2 ]
Fang, Guang [1 ]
Hou, Yeqiao [1 ]
Li, Xianglong [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
关键词
Distributed storage system; Erasure code; Data reliability; Heterogeneous; Cross-rack; Access skew; CODES;
D O I
10.1016/j.comnet.2023.109954
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern data storage systems apply erasure codes to provide data reliability efficiently. Previous studies proposed a series of techniques to weigh repair/storage costs, reduce codec complexity, minimize repair time, improve fault tolerance, and enforce system-level service level agreement. These techniques have been designed in isolation, leading to performance limitations. We explore the potential advantages of combining these techniques to meet data storage systems' requirements better and provide superior system performance. This work proposes a comprehensive repair scheme for fault data in distributed storage systems. First, we tailor design erasure codes in the presence of heterogeneity of storage devices. The core idea is to monitor device performance (e.g., access speed, reliability), compute two coefficients for each device, and use them to select the appropriate devices to create stripes of erasure codes. Second, we leverage the system hierarchy to perform intermediary repair operations, further minimizing cross-rack repair bandwidth. Finally, we propose a new repair scheme adapted to the skew of data access. To demonstrate the effectiveness of our comprehensive repair scheme, we evaluate various erasure codes via mathematical analysis and experiments in the Ceph cluster. In the mise-en-scene of traditional re-encoding methods and more recent adaptive erasure codes, our scheme stands out with significant savings in recovery bandwidth, code-switching bandwidth, repair time, and code-switching time.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Toward Optimal Secure Distributed Storage Systems With Exact Repair
    Tandon, Ravi
    Amuru, SaiDhiraj
    Clancy, Thomas Charles
    Buehrer, Richard Michael
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2016, 62 (06) : 3477 - 3492
  • [32] Decentralized Minimum-Cost Repair for Distributed Storage Systems
    Gerami, Majid
    Xiao, Ming
    Fischione, Carlo
    Skoglund, Mikael
    2013 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2013,
  • [33] Distributed Storage Systems with Secure and Exact Repair - New Results
    Tandon, Ravi
    Amuru, SaiDhiraj
    Clancy, T. Charles
    Buehrer, R. Michael
    2014 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2014, : 139 - 144
  • [34] Node Repair for Distributed Storage Systems over Fading Channels
    Karpuk, David
    Hollanti, Camilla
    Barreal, Amaro
    2014 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA), 2014, : 383 - 387
  • [35] Optimal Secure Partial-Repair in Distributed Storage Systems
    Gerami, Majid
    Xiao, Ming
    Salimi, Somayeh
    Skoglund, Mikael
    Papadimitratos, Panos
    2017 51ST ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2017,
  • [36] Data Secrecy in Distributed Storage Systems under Exact Repair
    Goparaju, Sreechakra
    El Rouayheb, Salim
    Calderbank, Robert
    Poor, H. Vincent
    2013 INTERNATIONAL SYMPOSIUM ON NETWORK CODING (NETCOD), 2013,
  • [37] Trade-off for Heterogeneous Distributed Storage Systems between Storage and Repair Cost
    K. G. Benerjee
    M. K. Gupta
    Problems of Information Transmission, 2021, 57 : 33 - 53
  • [38] A New Repair Strategy for the Hadamard Minimum Storage Regenerating Codes for Distributed Storage Systems
    Tang, Xiaohu
    Yang, Bin
    Li, Jie
    Hollmann, Henk D. L.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (10) : 5271 - 5279
  • [39] Trade-off for Heterogeneous Distributed Storage Systems between Storage and Repair Cost
    Benerjee, K. G.
    Gupta, M. K.
    PROBLEMS OF INFORMATION TRANSMISSION, 2021, 57 (01) : 33 - 53
  • [40] Distributed Error Correction Coding Scheme for Low Storage Blockchain Systems
    Wu, Huihui
    Ashikhmin, Alexei
    Wang, Xiaodong
    Li, Chong
    Yang, Sichao
    Zhang, Lei
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (08) : 7054 - 7071