Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

被引:5
|
作者
Esfahanizadeh, Homa [1 ]
Cohen, Alejandro [2 ]
Medard, Muriel [1 ]
机构
[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Technion Israel Inst Technol, Haifa, Israel
关键词
distributed systems; coded computation; heterogeneous; straggler; scheduling;
D O I
10.1109/INFOCOM48880.2022.9796977
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several workers, which brings up the major challenge of coping with delays and failures caused by the system's heterogeneity and uncertainties. In particular, minimizing the end-to-end job in-order execution delay, from arrival to delivery, is of great importance for real-world delay-sensitive applications. In this paper, for computation of each job iteration in a stochastic heterogeneous distributed system where the workers vary in their computing and communicating powers, we present a novel joint scheduling-coding framework that optimally split the coded computational load among the workers. This closes the gap between the workers' response time, and is critical to maximize the resource utilization. To further reduce the in-order execution delay, we also incorporate redundant computations in each iteration of a distributed computational job. Our simulation results demonstrate that the delay obtained using the proposed solution is dramatically lower than the uniform split which is oblivious to the system's heterogeneity and, in fact, is very close to an ideal lower bound just by introducing a small percentage of redundant computations.
引用
收藏
页码:230 / 239
页数:10
相关论文
共 50 条
  • [31] DRJLRA: A Deep Reinforcement Learning-Based Joint Load and Resource Allocation in Heterogeneous Coded Distributed Computing
    Heidarpour, Ali Reza
    Ardakani, Maryam Haghighi
    Ardakani, Masoud
    Tellambura, Chintha
    2023 IEEE 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, PIMRC, 2023,
  • [32] New Results on the Computation-Communication Tradeoff for Heterogeneous Coded Distributed Computing
    Xu, Fan
    Shao, Shuo
    Tao, Meixia
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (04) : 2254 - 2270
  • [33] Heterogeneous Coded Distributed Computing: Joint Design of File Allocation and Function Assignment
    Xu, Fan
    Tao, Meixia
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [34] Compressed Coded Distributed Computing
    Li, Songze
    Maddah-Ali, Mohammad Ali
    Avestimehr, A. Salman
    2018 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2018, : 2032 - 2036
  • [35] Secure Coded Distributed Computing
    Sasi, Shanuja
    Giinlii, Onur
    2024 IEEE 25TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, SPAWC 2024, 2024, : 811 - 815
  • [36] Compressed Coded Distributed Computing
    Elkordy, Ahmed Roushdy
    Li, Songze
    Maddah-Ali, Mohammad Ali
    Avestimehr, A. Salman
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (05) : 2773 - 2783
  • [37] Topological Coded Distributed Computing
    Wan, Kai
    Ji, Mingyue
    Caire, Giuseppe
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [38] A Comprehensive Survey on Coded Distributed Computing: Fundamentals, Challenges, and Networking Applications
    Ng, Jer Shyuan
    Lim, Wei Yang Bryan
    Luong, Nguyen Cong
    Xiong, Zehui
    Asheralieva, Alia
    Niyato, Dusit
    Leung, Cyril
    Miao, Chunyan
    IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2021, 23 (03): : 1800 - 1837
  • [39] Iterative learning control approach for a kind of heterogeneous multi-agent systems with distributed initial state learning
    Li, Jinsha
    Li, Junmin
    APPLIED MATHEMATICS AND COMPUTATION, 2015, 265 : 1044 - 1057
  • [40] A uniform approach for programming distributed heterogeneous computing systems
    Grasso, Ivan
    Pellegrini, Simone
    Cosenza, Biagio
    Fahringer, Thomas
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (12) : 3228 - 3239