Communication-Constrained Distributed Learning: TSI-Aided Asynchronous Optimization with Stale Gradient

被引：0

作者：

Yu, Siyuan ^{[1
,2
]}

Chen, Wei ^{[1
,2
]}

Poor, H. Vincent ^{[3
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

[2] Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China

[3] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA

来源：

IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM | 2023年

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

Asynchronous optimization; stochastic gradient descent; timing side information; gradient staleness; federated learning;

D O I：

10.1109/GLOBECOM54140.2023.10437351

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Distributed machine learning including federated learning has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect the user privacy. As one of key enablers of distributed learning, asynchronous optimization allows multiple workers to process data simultaneously without paying a cost of synchronization delay. However, given limited communication bandwidth, asynchronous optimization can be hampered by gradient staleness, which severely hinders the learning process. In this paper, we present a communication-constrained distributed learning scheme, in which asynchronous stochastic gradients generated by parallel workers are transmitted over a shared medium or link. Our aim is to minimize the average training time by striking the optimal tradeoff between the number of parallel workers and their gradient staleness. To this end, a queueing theoretic model is formulated, which allows us to find the optimal number of workers participating in the asynchronous optimization. Furthermore, we also leverage the packet arrival time at the parameter server, also referred to as Timing Side Information (TSI), to compress the staleness information for the staleness-aware Asynchronous Stochastic Gradients Descent (Asyn-SGD). Numerical results demonstrate the substantial reduction of training time owing to both the worker selection and TSI-aided compression of staleness information.

引用

页码：1495 / 1500

页数：6

共 50 条

[41] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
Zhang, Lin
Zhang, Longteng
Shi, Shaohuai
Chu, Xiaowen
Li, Bo
2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 361 - 371
[42] Gradient-Adaptive Pareto Optimization for Constrained Reinforcement Learning
Zhou, Zixian
Huang, Mengda
Pan, Feiyang
He, Jia
Ao, Xiang
Tu, Dandan
He, Qing
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11443 - 11451
[43] DISTRIBUTED ASYNCHRONOUS ALGORITHMS WITH STOCHASTIC DELAYS FOR CONSTRAINED OPTIMIZATION PROBLEMS WITH CONDITIONS OF TIME DRIFT
BEIDAS, BF
PAPAVASSILOPOULOS, GP
PARALLEL COMPUTING, 1995, 21 (09) : 1431 - 1450
[44] SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization
Wai, Hoi-To
Freris, Nikolaos M.
Nedic, Angelia
Scaglione, Anna
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 1751 - 1756
[45] Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning
Peng, Jing
Shi, Shaohuai
Li, Zihan
Li, Bo
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 148 - 157
[46] Ordered Gradient Approach for Communication-Efficient Distributed Learning
Chen, Yicheng
Sadler, Brian M.
Blum, Rick S.
PROCEEDINGS OF THE 21ST IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC2020), 2020,
[47] Communication-Adaptive Stochastic Gradient Methods for Distributed Learning
Chen, Tianyi
Sun, Yuejiao
Yin, Wotao
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4637 - 4651
[48] Distributed online constrained convex optimization with event-triggered communication
Zhang, Kunpeng
Yi, Xinlei
Li, Yuzhe
Cao, Ming
Chai, Tianyou
Yang, Tao
EUROPEAN JOURNAL OF CONTROL, 2024, 80
[49] Joint learning of constraint weights and gradient inputs in Gradient Symbolic Computation with constrained optimization
Nelson, Max
17TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2020), 2020, : 224 - 232
[50] An Incremental Gradient Method for Large-scale Distributed Nonlinearly Constrained Optimization
Kaushik, Harshal D.
Yousefian, Farzad
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 953 - 958

← 1 2 3 4 5 →