TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems

被引:12
|
作者
Al-Abbasi, Abubakr O. [1 ]
Aggarwal, Vaneet [1 ,2 ]
Lan, Tian [3 ]
机构
[1] Purdue Univ, Sch Ind Engn, W Lafayette, IN 47907 USA
[2] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[3] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA
基金
美国国家科学基金会;
关键词
Optimization; Servers; Probabilistic logic; Cloud computing; Indexes; Encoding; Queueing analysis; Tail latency; erasure coding; distributed storage systems; bi-partite matching; alternating optimization; laplace Stieltjes transform; TRADE-OFF; OPTIMIZATION; QUEUE;
D O I
10.1109/TNSM.2019.2916877
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed storage systems are known to be susceptible to long tails in response time. In modern online storage systems such as Bing, Facebook, and Amazon, the long tails of the service latency are of particular concern, with 99.9th percentile response times being orders of magnitude worse than the mean. As erasure codes emerge as a popular technique to achieve high data reliability in distributed storage while attaining space efficiency, taming tail latency still remains an open problem due to the lack of mathematical models for analyzing such systems. To this end, we propose a framework for quantifying and optimizing tail latency in erasure-coded storage systems. In particular, we derive upper bounds on tail latency in closed-form for arbitrary service time distribution and heterogeneous files. Based on the model, we formulate an optimization problem to jointly minimize weighted latency tail probability of all files over the placement of files on the servers, and the choice of servers to access the requested files. The non-convex problem is solved using an efficient, alternating optimization algorithm. Further, we mathematically quantify, in closed form, the tail index, i.e., the exponent at which latency tail probability diminishes to zero, of the service latency for arbitrary erasure-coded storage, by characterizing the asymptotic behavior of latency distribution tails. We further show that probabilistic scheduling-based algorithms are (asymptotically) optimal since they are able to achieve the exact tail index. Evaluation results show significant reduction of tail latency for erasure-coded storage systems with realistic workload. Based on the offline algorithm, an online version is developed and its superiority over the state-of-the-art algorithms, e.g., join-shortest-queue (JSQ), power-of-d [Pof(d))], least-load [LL(d)], is shown. Finally, a cloud storage system is implemented in a real cloud environment to show the superiority of our approach as compared to the considered baselines.
引用
收藏
页码:1609 / 1623
页数:15
相关论文
共 50 条
  • [41] Optimal resilience for erasure-coded Byzantine distributed storage
    Cachin, C
    Tessaro, S
    DISTRIBUTED COMPUTING, PROCEEDINGS, 2005, 3724 : 497 - 498
  • [42] Parallelized In-Network Aggregation for Failure Repair in Erasure-Coded Storage Systems
    Xia, Junxu
    Luo, Lailong
    Sun, Bowen
    Cheng, Geyao
    Guo, Deke
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (04) : 2888 - 2903
  • [43] FullRepair: Towards Optimal Repair Pipelining in Erasure-Coded Clustered Storage Systems
    Zhang, Yuzuo
    Tu, Xinyuan
    Wang, Lin
    Hu, Yuchong
    Wang, Fang
    Wang, Ye
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 107 - 117
  • [44] EEC-Dedup: Efficient Erasure-Coded Deduplicated Backup Storage Systems
    Chen, Wenxiang
    Hu, Yuchong
    Yin, Siyang
    Xia, Wen
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 251 - 258
  • [45] Coop-U: A Cooperative Update Scheme for Erasure-Coded Storage Systems
    Dong, Yan
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (12) : 7385 - 7396
  • [46] Adaptive Updates for Erasure-Coded Storage Systems Based on Data Delta and Logging
    Wei, Bing
    Wu, Jigang
    Su, Xiaosong
    Huang, Qiang
    Liu, Yujun
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 187 - 197
  • [47] Efficient Byzantine-tolerant erasure-coded storage
    Goodson, GR
    Wylie, JJ
    Ganger, GR
    Reiter, MK
    2004 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2004, : 135 - 144
  • [48] LAR: Locality-Aware Reconstruction for erasure-coded distributed storage systems
    Xu, Fangliang
    Wang, Yijie
    Pei, Xiaoqiang
    Ma, Xingkong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (11):
  • [49] Coop-U: A Cooperative Update Scheme for Erasure-Coded Storage Systems
    Yan Dong
    Arabian Journal for Science and Engineering, 2018, 43 : 7385 - 7396
  • [50] Taming latency in data center networking with erasure coded files
    Xiang, Yu
    Aggarwal, Vaneet
    Chen, Yih-Farn R.
    Lan, Tian
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 241 - 250