Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

被引:1
|
作者
Cong, Guojing [1 ]
Liu, Tianyi [2 ]
机构
[1] IBM TJ Watson Res Ctr, Ossining, NY 10562 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
D O I
10.1109/MLHPCAI4S51975.2020.00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.
引用
收藏
页码:29 / 39
页数:11
相关论文
共 50 条
  • [31] Stochastic mirror descent method for distributed multi-agent optimization
    Jueyou Li
    Guoquan Li
    Zhiyou Wu
    Changzhi Wu
    Optimization Letters, 2018, 12 : 1179 - 1197
  • [32] On the Convergence Properties of a K-Step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization
    Zhou, Fan
    Cong, Guojing
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3219 - 3227
  • [33] Event-Triggered Distributed Stochastic Mirror Descent for Convex Optimization
    Xiong, Menghui
    Zhang, Baoyong
    Ho, Daniel W. C.
    Yuan, Deming
    Xu, Shengyuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6480 - 6491
  • [34] Stochastic mirror descent method for distributed multi-agent optimization
    Li, Jueyou
    Li, Guoquan
    Wu, Zhiyou
    Wu, Changzhi
    OPTIMIZATION LETTERS, 2018, 12 (06) : 1179 - 1197
  • [35] Gossip-based distributed stochastic mirror descent for constrained optimization
    Fang, Xianju
    Zhang, Baoyong
    Yuan, Deming
    NEURAL NETWORKS, 2024, 175
  • [36] Distributed stochastic power control in ad hoc networks: a nonconvex optimization case
    Lei Yang
    Yalin E Sagduyu
    Junshan Zhang
    Jason H Li
    EURASIP Journal on Wireless Communications and Networking, 2012
  • [37] Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent
    Pu, Shi
    Olshevsky, Alex
    Paschalidis, Ioannis Ch.
    IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 114 - 122
  • [38] NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization
    Hajinezhad, Davood
    Hong, Mingyi
    Zhao, Tuo
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [39] Distributed stochastic power control in ad hoc networks: a nonconvex optimization case
    Yang, Lei
    Sagduyu, Yalin E.
    Zhang, Junshan
    Li, Jason H.
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2012,
  • [40] Distributed Event-Triggered Stochastic Gradient-Tracking for Nonconvex Optimization
    Ishikawa, Daichi
    Hayashi, Naoki
    Takai, Shigemasa
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2024, E107A (05) : 762 - 769