Speeding up Distributed Request-Response Workflows

被引：74

作者：

Jalaparti, Virajith

Bodik, Peter ^{[1
]}

Kandula, Srikanth ^{[1
]}

Menache, Ishai ^{[1
]}

Rybalkin, Mikhail

Yan, Chenyu ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

来源：

ACM SIGCOMM COMPUTER COMMUNICATION REVIEW | 2013年 / 43卷 / 04期

关键词：

Interactive services; Tail latency; Optimization; Reissues; Partial results;

D O I：

10.1145/2534169.2486028

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.

引用

页码：219 / 230

页数：12

共 50 条

[31] Speeding up Scientific Imaging Workflows: Design of Automated Image Annotation Tool
Colbry, Dirk
Dyer, Fred
Dworkin, Ian
Wang, Yang
Wang, Lifeng
2013 1ST IEEE WORKSHOP ON USER-CENTERED COMPUTER VISION (UCCV), 2013, : 13 - 18
[32] Speeding Up Distributed Machine Learning Using Codes
Lee, Kangwook
Lam, Maximilian
Pedarsani, Ramtin
Papailiopoulos, Dimitris
Ramchandran, Kannan
2016 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2016, : 1143 - 1147
[33] Leveraging Coding Techniques for Speeding up Distributed Computing
Konstantinidis, Konstantinos
Ramamoorthy, Aditya
2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
[34] Speeding Up Distributed Machine Learning Using Codes
Lee, Kangwook
Lam, Maximilian
Pedarsani, Ramtin
Papailiopoulos, Dimitris
Ramchandran, Kannan
IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (03) : 1514 - 1529
[35] Speeding up Distributed Low-rank Matrix Factorization
Qin, Chengjie
Rusu, Florin
2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 521 - 528
[36] REPLICATION TECHNIQUES FOR SPEEDING UP PARALLEL APPLICATIONS ON DISTRIBUTED SYSTEMS
BAL, HE
KAASHOEK, MF
TANENBAUM, AS
JANSEN, J
CONCURRENCY-PRACTICE AND EXPERIENCE, 1992, 4 (05): : 337 - 355
[37] A Just-in-Time Networking Framework for Minimizing Request-Response Latency of Wireless Time-Sensitive Applications
Zhang, Lihao
Liew, Soung Chang
Chen, He
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (08) : 7126 - 7142
[38] Time-discretization for speeding-up scheduling of deadline-constrained workflows in clouds
Genez, Thiago A. L.
Bittencourt, Luiz F.
Madeira, Edmundo R. M.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 107 : 1116 - 1129
[39] Leveraging Coding Techniques and Redundancy for Speeding Up Distributed Computing and Robustifying Distributed Learning
Konstantinidis, Konstantinos
ProQuest Dissertations and Theses Global, 2022,
[40] Prophet: Speeding up Distributed DNN Training with Predictable Communication Scheduling
Zhang, Zhenwei
Qi, Qiang
Shang, Ruitao
Chen, Li
Xu, Fei
50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,

← 1 2 3 4 5 →