Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

被引:0
|
作者
Yuan, Kun [1 ]
Alghunaim, Sulaiman A. [2 ]
Huang, Xinmeng [3 ]
机构
[1] Peking Univ, AI Sci Inst, Ctr Machine Learning Res, Beijing 100871, Peoples R China
[2] Kuwait Univ, Dept Elect Engn, Safat 13060, Kuwait
[3] Univ Penn, Grad Grp Appl Math & Computat Sci, Philadelphia, PA 19104 USA
关键词
Decentralized optimization; stochastic optimization; transient stage; DISTRIBUTED OPTIMIZATION; LINEAR CONVERGENCE; DIFFUSION; COMMUNICATION; ALGORITHMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider decentralized stochastic optimization problems, where a network of n nodes cooperates to find a minimizer of the globally-averaged cost. A widely studied decentralized algorithm for this problem is the decentralized SGD (D-SGD), in which each node averages only with its neighbors. D-SGD is efficient in single-iteration communication, but it is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures the number of iterations the algorithm has to experience before achieving the linear speedup stage) of D-SGD is on the order of O(n/(1 - beta )2) and O(n3/(1 - beta )4) for strongly and generally convex cost functions, respectively, where 1 - beta is an element of (0, 1) is a topology-dependent quantity that approaches 0 for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we revisit the convergence property of the D2/Exact-Diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D2/Exact-diffusion is shown to have an enhanced transient stage that is on the order of O similar to(n/(1 - beta)) and O(n3/(1 - beta )2) for strongly and generally convex cost functions (where O similar to(center dot) hides all logarithm factors), respectively. Moreover, when D2/Exact-Diffusion is implemented with both gradient accumulation and multi-round gossip communications, its transient stage can be further improved to O similar to(1/(1- beta)12 ) and O similar to(n/(1- beta)) for strongly and generally convex cost functions, respectively. To our knowledge, these established results for D2/Exact-Diffusion have the best (i.e., weakest) dependence on network topology compared to existing decentralized algorithms. Numerical simulations are conducted to validate our theories.
引用
收藏
页数:53
相关论文
共 6 条
  • [1] Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology
    Huang, Yan
    Sun, Ying
    Zhu, Zehan
    Yan, Changzhi
    Xu, Jinming
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Refined Convergence and Topology Learning for Decentralized SGD with Heterogeneous Data
    Le Bars, Batiste
    Bellet, Aurelien
    Tommasi, Marc
    Lavoie, Erick
    Kermarrec, Anne-Marie
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [3] D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning
    Bellet, Aurelien
    Kermarrec, Anne-Marie
    Lavoie, Erick
    2022 41ST INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2022), 2022, : 1 - 11
  • [4] The influence of dependence on data network models
    D'Auria, Bernardo
    Resnick, Sidney I.
    ADVANCES IN APPLIED PROBABILITY, 2008, 40 (01) : 60 - 94
  • [5] The influence of road network topology on street flooding in New York City-A social media data approach
    Zuo, Chen
    Wang, Runzi
    Hong, Yi
    Zhou, Yuhan
    He, Yiyi
    Gronewold, Andrew D.
    JOURNAL OF HYDROLOGY, 2024, 638
  • [6] Urban rail transit network topology evolutionary stage has influence on rail ridership: Insights from linear mixed-effects models with heterogeneity in variances
    Xin, Mengwei
    Feng, Shumin
    TRANSPORTATION RESEARCH PART A-POLICY AND PRACTICE, 2024, 180