Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

被引:1
|
作者
Cong, Guojing [1 ]
Liu, Tianyi [2 ]
机构
[1] IBM TJ Watson Res Ctr, Ossining, NY 10562 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
D O I
10.1109/MLHPCAI4S51975.2020.00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.
引用
收藏
页码:29 / 39
页数:11
相关论文
共 50 条
  • [21] Online Distributed Stochastic Gradient Algorithm for Nonconvex Optimization With Compressed Communication
    Li, Jueyou
    Li, Chaojie
    Fan, Jing
    Huang, Tingwen
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (02) : 936 - 951
  • [22] Distributed Adaptive Gradient Algorithm With Gradient Tracking for Stochastic Nonconvex Optimization
    Han, Dongyu
    Liu, Kun
    Lin, Yeming
    Xia, Yuanqing
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (09) : 6333 - 6340
  • [23] Variance-reduced reshuffling gradient descent for nonconvex optimization: Centralized and distributed algorithms
    Jiang, Xia
    Zeng, Xianlin
    Xie, Lihua
    Sun, Jian
    Chen, Jie
    AUTOMATICA, 2025, 171
  • [24] On the Hyperparameters in Stochastic Gradient Descent with Momentum
    Shi, Bin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [25] On the Generalization of Stochastic Gradient Descent with Momentum
    Ramezani-Kebrya, Ali
    Antonakopoulos, Kimon
    Cevher, Volkan
    Khisti, Ashish
    Liang, Ben
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
  • [26] A Momentum-Based Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Optimization
    Shi, Qiankun
    Wang, Xiao
    Wang, Hao
    MATHEMATICS OF OPERATIONS RESEARCH, 2025,
  • [27] Cubic Regularization with Momentum for Nonconvex Optimization
    Wang, Zhe
    Zhou, Yi
    Liang, Yingbin
    Lan, Guanghui
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 313 - 322
  • [28] Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives
    Lei, Yunwen
    Tang, Ke
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4505 - 4511
  • [29] Efficient Decentralized Stochastic Gradient Descent Method for Nonconvex Finite-Sum Optimization Problems
    Zhan, Wenkang
    Wu, Gang
    Gao, Hongchang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9006 - 9013
  • [30] A stochastic gradient tracking algorithm with adaptive momentum for distributed optimization
    Li, Yantao
    Hu, Hanqing
    Zhang, Keke
    Lu, Qingguo
    Deng, Shaojiang
    Li, Huaqing
    NEUROCOMPUTING, 2025, 637