TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems

被引:12
|
作者
Al-Abbasi, Abubakr O. [1 ]
Aggarwal, Vaneet [1 ,2 ]
Lan, Tian [3 ]
机构
[1] Purdue Univ, Sch Ind Engn, W Lafayette, IN 47907 USA
[2] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[3] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA
基金
美国国家科学基金会;
关键词
Optimization; Servers; Probabilistic logic; Cloud computing; Indexes; Encoding; Queueing analysis; Tail latency; erasure coding; distributed storage systems; bi-partite matching; alternating optimization; laplace Stieltjes transform; TRADE-OFF; OPTIMIZATION; QUEUE;
D O I
10.1109/TNSM.2019.2916877
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed storage systems are known to be susceptible to long tails in response time. In modern online storage systems such as Bing, Facebook, and Amazon, the long tails of the service latency are of particular concern, with 99.9th percentile response times being orders of magnitude worse than the mean. As erasure codes emerge as a popular technique to achieve high data reliability in distributed storage while attaining space efficiency, taming tail latency still remains an open problem due to the lack of mathematical models for analyzing such systems. To this end, we propose a framework for quantifying and optimizing tail latency in erasure-coded storage systems. In particular, we derive upper bounds on tail latency in closed-form for arbitrary service time distribution and heterogeneous files. Based on the model, we formulate an optimization problem to jointly minimize weighted latency tail probability of all files over the placement of files on the servers, and the choice of servers to access the requested files. The non-convex problem is solved using an efficient, alternating optimization algorithm. Further, we mathematically quantify, in closed form, the tail index, i.e., the exponent at which latency tail probability diminishes to zero, of the service latency for arbitrary erasure-coded storage, by characterizing the asymptotic behavior of latency distribution tails. We further show that probabilistic scheduling-based algorithms are (asymptotically) optimal since they are able to achieve the exact tail index. Evaluation results show significant reduction of tail latency for erasure-coded storage systems with realistic workload. Based on the offline algorithm, an online version is developed and its superiority over the state-of-the-art algorithms, e.g., join-shortest-queue (JSQ), power-of-d [Pof(d))], least-load [LL(d)], is shown. Finally, a cloud storage system is implemented in a real cloud environment to show the superiority of our approach as compared to the considered baselines.
引用
收藏
页码:1609 / 1623
页数:15
相关论文
共 50 条
  • [21] An Efficient Parallel Coding Scheme in Erasure-Coded Storage Systems
    Dong, Wenrui
    Liu, Guangming
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (03): : 627 - 643
  • [22] Erasure-Coded Byzantine Storage with Separate Metadata
    Androulaki, Elli
    Cachin, Christian
    Dobre, Dan
    Vukolic, Marko
    PRINCIPLES OF DISTRIBUTED SYSTEMS, OPODIS 2014, 2014, 8878 : 76 - 90
  • [23] An Erasure-Coded Storage System for Edge Computing
    Liang, Lixin
    He, Huan
    Zhao, Jian
    Liu, Chengjian
    Luo, Qiuming
    Chu, Xiaowen
    IEEE ACCESS, 2020, 8 (08): : 96271 - 96283
  • [24] Reliability analysis of deduplicated and erasure-coded storage
    Li, Xiaozhou
    Lillibridge, Mark
    Uysal, Mustafal
    HP Laboratories Technical Report, 2010, (146):
  • [25] Adaptive Bandwidth-Efficient Recovery Techniques in Erasure-Coded Cloud Storage
    Nachiappan, Rekha
    Javadi, Bahman
    Calheiros, Rodrigo N.
    Matawie, Kenan M.
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 325 - 338
  • [26] Efficient Updates in Cross-Object Erasure-Coded Storage Systems
    Esmaili, Kyumars Sheykh
    Chiniah, Aatish
    Datta, Anwitaman
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [27] A popularity-aware reconstruction technique in erasure-coded storage systems
    Cao, Ting
    Peng, Xiaopu
    Zhang, Chaowei
    Al Tekreeti, Taha Khalid
    Mao, Jianzhou
    Qin, Xiao
    Huang, Jianzhong
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 146 : 122 - 138
  • [28] Fast Predictive Repair in Erasure-Coded Storage
    Shen, Zhirong
    Li, Xiaolu
    Lee, Patrick P. C.
    2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2019), 2019, : 556 - 567
  • [29] SelectiveEC: Towards Balanced Recovery Load on Erasure-Coded Storage Systems
    Xu, Liangliang
    Lyu, Min
    Li, Qiliang
    Xie, Lingjiang
    Li, Cheng
    Xu, Yinlong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2386 - 2400
  • [30] Fast Repair for Single Failure in Erasure-coded Distributed Storage Systems
    Zhang, Huayu
    Li, Hui
    Zhu, Bing
    Chen, Jun
    2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS), 2014, : 146 - 151