Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

被引：0

作者：

Yuan, Kun ^{[1
]}

Alghunaim, Sulaiman A. ^{[2
]}

Huang, Xinmeng ^{[3
]}

机构：

[1] Peking Univ, AI Sci Inst, Ctr Machine Learning Res, Beijing 100871, Peoples R China

[2] Kuwait Univ, Dept Elect Engn, Safat 13060, Kuwait

[3] Univ Penn, Grad Grp Appl Math & Computat Sci, Philadelphia, PA 19104 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

关键词：

Decentralized optimization; stochastic optimization; transient stage; DISTRIBUTED OPTIMIZATION; LINEAR CONVERGENCE; DIFFUSION; COMMUNICATION; ALGORITHMS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider decentralized stochastic optimization problems, where a network of n nodes cooperates to find a minimizer of the globally-averaged cost. A widely studied decentralized algorithm for this problem is the decentralized SGD (D-SGD), in which each node averages only with its neighbors. D-SGD is efficient in single-iteration communication, but it is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures the number of iterations the algorithm has to experience before achieving the linear speedup stage) of D-SGD is on the order of O(n/(1 - beta )2) and O(n3/(1 - beta )4) for strongly and generally convex cost functions, respectively, where 1 - beta is an element of (0, 1) is a topology-dependent quantity that approaches 0 for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we revisit the convergence property of the D2/Exact-Diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D2/Exact-diffusion is shown to have an enhanced transient stage that is on the order of O similar to(n/(1 - beta)) and O(n3/(1 - beta )2) for strongly and generally convex cost functions (where O similar to(center dot) hides all logarithm factors), respectively. Moreover, when D2/Exact-Diffusion is implemented with both gradient accumulation and multi-round gossip communications, its transient stage can be further improved to O similar to(1/(1- beta)12 ) and O similar to(n/(1- beta)) for strongly and generally convex cost functions, respectively. To our knowledge, these established results for D2/Exact-Diffusion have the best (i.e., weakest) dependence on network topology compared to existing decentralized algorithms. Numerical simulations are conducted to validate our theories.

引用

页数：53

共 6 条

[1] Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology
Huang, Yan
Sun, Ying
Zhu, Zehan
Yan, Changzhi
Xu, Jinming
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[2] Refined Convergence and Topology Learning for Decentralized SGD with Heterogeneous Data
Le Bars, Batiste
Bellet, Aurelien
Tommasi, Marc
Lavoie, Erick
Kermarrec, Anne-Marie
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[3] D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning
Bellet, Aurelien
Kermarrec, Anne-Marie
Lavoie, Erick
2022 41ST INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2022), 2022, : 1 - 11
[4] The influence of dependence on data network models
D'Auria, Bernardo
Resnick, Sidney I.
ADVANCES IN APPLIED PROBABILITY, 2008, 40 (01) : 60 - 94
[5] The influence of road network topology on street flooding in New York City-A social media data approach
Zuo, Chen
Wang, Runzi
Hong, Yi
Zhou, Yuhan
He, Yiyi
Gronewold, Andrew D.
JOURNAL OF HYDROLOGY, 2024, 638
[6] Urban rail transit network topology evolutionary stage has influence on rail ridership: Insights from linear mixed-effects models with heterogeneity in variances
Xin, Mengwei
Feng, Shumin
TRANSPORTATION RESEARCH PART A-POLICY AND PRACTICE, 2024, 180

← 1 →