Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

被引:5
|
作者
Esfahanizadeh, Homa [1 ]
Cohen, Alejandro [2 ]
Medard, Muriel [1 ]
机构
[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Technion Israel Inst Technol, Haifa, Israel
关键词
distributed systems; coded computation; heterogeneous; straggler; scheduling;
D O I
10.1109/INFOCOM48880.2022.9796977
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several workers, which brings up the major challenge of coping with delays and failures caused by the system's heterogeneity and uncertainties. In particular, minimizing the end-to-end job in-order execution delay, from arrival to delivery, is of great importance for real-world delay-sensitive applications. In this paper, for computation of each job iteration in a stochastic heterogeneous distributed system where the workers vary in their computing and communicating powers, we present a novel joint scheduling-coding framework that optimally split the coded computational load among the workers. This closes the gap between the workers' response time, and is critical to maximize the resource utilization. To further reduce the in-order execution delay, we also incorporate redundant computations in each iteration of a distributed computational job. Our simulation results demonstrate that the delay obtained using the proposed solution is dramatically lower than the uniform split which is oblivious to the system's heterogeneity and, in fact, is very close to an ideal lower bound just by introducing a small percentage of redundant computations.
引用
收藏
页码:230 / 239
页数:10
相关论文
共 50 条
  • [21] Block Allocation of Systematic Coded Distributed Computing in Heterogeneous Straggling Networks
    Wang, Yu
    Gu, Shushi
    Zhang, Zhikai
    Zhang, Qinyu
    Xiang, Wei
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 1066 - 1071
  • [22] Poster: Iterative Scheduling for Distributed Stream Processing Systems
    Eskandari, Leila
    Mair, Jason
    Huang, Zhiyi
    Eyers, David
    DEBS'18: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2018, : 234 - 237
  • [23] Distributed Decoding for Coded Distributed Computing
    Yazdanialahabadi, Arash
    Ardakani, Masoud
    IEEE INTERNET OF THINGS JOURNAL, 2021, 9 (14) : 12555 - 12562
  • [24] OBJECT ORIENTATION IN HETEROGENEOUS DISTRIBUTED COMPUTING SYSTEMS
    NICOL, JR
    WILKES, CT
    MANOLA, FA
    COMPUTER, 1993, 26 (06) : 57 - 67
  • [25] On task allocation in heterogeneous distributed computing systems
    Indian Inst of Technology, Madras, India
    Comput Syst Sci Eng, 4 (231-238):
  • [26] Security architecture for heterogeneous distributed computing systems
    Naqvi, S
    Riguidel, M
    38TH ANNUAL 2004 INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY, PROCEEDINGS, 2004, : 34 - 41
  • [27] Environment for integration of distributed heterogeneous computing systems
    Silva, Thiago W. B.
    Morais, Daniel C.
    Andrade, Halamo G. R.
    Lima, Antonio M. N.
    Melcher, Elmar U. K.
    Brito, Alisson V.
    JOURNAL OF INTERNET SERVICES AND APPLICATIONS, 2018, 9
  • [28] On task allocation in heterogeneous distributed computing systems
    Ignatius, PP
    Murthy, CSR
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 1997, 12 (04): : 231 - 238
  • [29] Coded Distributed Computing for Hierarchical Multi-task Learning
    Hu, Haoyang
    Li, Songze
    Cheng, Minquan
    Wu, Youlong
    2023 IEEE INFORMATION THEORY WORKSHOP, ITW, 2023, : 480 - 485
  • [30] Coded Computing for Distributed Machine Learning in Wireless Edge Network
    Dhakal, Sagar
    Prakash, Saurav
    Yona, Yair
    Talwar, Shilpa
    Himayat, Nageen
    2019 IEEE 90TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2019-FALL), 2019,