Enabling Efficient Erasure Coding in Disaggregated Memory Systems

被引:0
|
作者
Li, Qiliang [1 ,2 ]
Xu, Liangliang [1 ]
Li, Yongkun [1 ,3 ]
Lyu, Min [1 ,3 ]
Wang, Wei [1 ,2 ]
Zuo, Pengfei [4 ]
Xu, Yinlong [1 ,3 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China
[2] Huawei Cloud, Sch Comp Sci & Technol, Shanghai 201206, Peoples R China
[3] Anhui Prov Key Lab High Performance Comp, Hefei 230026, Peoples R China
[4] Huawei Cloud, Shenzhen 518100, Peoples R China
关键词
Encoding; Servers; Memory management; Random access memory; Fault tolerant systems; Fault tolerance; Throughput; Disaggregated memory; erasure coding; pipeline; reliability; STORE;
D O I
10.1109/TPDS.2023.3332782
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Disaggregated memory (DM) separates compute and memory resources to build a huge memory pool. Erasure coding (EC) is expected to provide fault tolerance in DM with low memory cost. In DM with EC, objects are first coded in compute servers, then directly written to memory servers via high-speed networks like one-sided RDMA. However, as the one-sided RDMA latency goes down to the microsecond level, coding overhead degrades the performance in DM with EC. To enable efficient EC in DM, we thoroughly analyze the coding stack from the perspective of cache efficiency and RDMA transmission. We develop MicroEC, which optimizes the coding workflow by reusing the auxiliary coding data and coordinates the coding and RDMA transmission with an exponential pipeline, as well as carefully adjusting the coding and transmission threads to minimize the latency. We implement a prototype supporting common basic operations, such as write/read/degraded read/recovery. Experiments show that MicroEC reduces the write latency by up to 44.35% and 42.14% and achieves up to 1.80 x and 1.73 write throughput, compared with the state-of-the-art DM systems with EC and 3-way replication for objects not smaller than 1 MB, respectively. For small objects, MicroEC also evidently reduces the variation of latency, e.g., it reduces the P99 latency of writing 1 KB objects by 27.81%.
引用
收藏
页码:154 / 168
页数:15
相关论文
共 50 条
  • [1] Enabling Energy Efficient Hybrid Memory Cube Systems with Erasure Codes
    Wang, Shibo
    Song, Yanwei
    Bojnordi, Mahdi Nazm
    Ipek, Engin
    2015 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2015, : 67 - 72
  • [2] Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems
    Li, Runhui
    Hu, Yuchong
    Lee, Patrick P. C.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) : 2500 - 2513
  • [3] Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems
    Li, Runhui
    Hu, Yuchong
    Lee, Patrick P. C.
    2015 45TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, 2015, : 148 - 159
  • [4] Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems
    Wang, Zixuan
    Sim, Joonseop
    Lim, Euicheol
    Zhao, Jishen
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 126 - 140
  • [5] An Efficient Parallel Coding Scheme in Erasure-Coded Storage Systems
    Dong, Wenrui
    Liu, Guangming
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (03): : 627 - 643
  • [6] Efficient and Available In-memory KV-Store with Hybrid Erasure Coding and Replication
    Zhang, Heng
    Dong, Mingkai
    Chen, Haibo
    14TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST '16), 2016, : 167 - 180
  • [7] Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication
    Chen, Haibo
    Zhang, Heng
    Dong, Mingkai
    Wang, Zhaoguo
    Xia, Yubin
    Guan, Haibing
    Zang, Binyu
    ACM TRANSACTIONS ON STORAGE, 2017, 13 (03) : 1 - 30
  • [8] UHUM: An Efficient Hybrid Update Mechanism in Distributed Storage Systems with Erasure Coding
    Luo, Qian
    Wang, Yun
    PROCEEDINGS OF THE 2019 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2019, : 158 - 163
  • [9] Robot: An Efficient Model For Big Data Storage Systems Based On Erasure Coding
    Yin, Chao
    Wang, Jianzong
    Xie, Changsheng
    Wan, Jiguang
    Long, Changlin
    Bi, Wenjuan
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [10] Memory Erasure in Small Systems
    Dillenschneider, Raoul
    Lutz, Eric
    PHYSICAL REVIEW LETTERS, 2009, 102 (21)