Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

被引:0
|
作者
Nicolas Loizou
Peter Richtárik
机构
[1] Université de Montréal,Mila and DIRO
[2] King Abdullah University of Science and Technology (KAUST),undefined
关键词
Stochastic methods; Heavy ball momentum; Linear systems; Randomized coordinate descent; Randomized Kaczmarz; Stochastic gradient descent; Stochastic Newton; Quadratic optimization; Convex optimization; 68Q25; 68W20; 68W40; 65Y20; 90C15; 90C20; 90C25; 15A06; 15B52; 65F10;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesàro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.
引用
收藏
页码:653 / 710
页数:57
相关论文
共 50 条
  • [41] Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
    Zhou, Jingcheng
    Wei, Wei
    Zhang, Ruizhi
    Zheng, Zhiming
    MATHEMATICS, 2021, 9 (13)
  • [42] A Stochastic Momentum Accelerated Quasi-Newton Method for Neural Networks
    Indrapriyadarsini, S.
    Mahboubi, Shahrzad
    Ninomiya, Hiroshi
    Kamio, Takeshi
    Asai, Hideki
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12973 - 12974
  • [43] Multi-stage stochastic gradient method with momentum acceleration
    Luo, Zhijian
    Chen, Siyu
    Qian, Yuntao
    Hou, Yueen
    SIGNAL PROCESSING, 2021, 188
  • [44] A stochastic gradient tracking algorithm with adaptive momentum for distributed optimization
    Li, Yantao
    Hu, Hanqing
    Zhang, Keke
    Lu, Qingguo
    Deng, Shaojiang
    Li, Huaqing
    NEUROCOMPUTING, 2025, 637
  • [45] A Unified Analysis of Stochastic Momentum Methods for Deep Learning
    Yan, Yan
    Yang, Tianbao
    Li, Zhe
    Lin, Qihang
    Yang, Yi
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2955 - 2961
  • [46] HSB-GDM: a Hybrid Stochastic-Binary Circuit for Gradient Descent with Momentum in the Training of Neural Networks
    Li, Han
    Shi, Heng
    Jiang, Honglan
    Liu, Siting
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES, NANOARCH 2022, 2022,
  • [47] A SEMISMOOTH NEWTON STOCHASTIC PROXIMAL POINT ALGORITHM WITH VARIANCE REDUCTION
    Milzarek, Andre
    Schaipp, Fabian
    Ulbrich, Michael
    SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) : 1157 - 1185
  • [48] AUTOMATIC AND SIMULTANEOUS ADJUSTMENT OF LEARNING RATE AND MOMENTUM FOR STOCHASTIC GRADIENT-BASED OPTIMIZATION METHODS
    Lancewicki, Tomer
    Kopru, Selcuk
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3127 - 3131
  • [49] Minibatch Stochastic Approximate Proximal Point Methods
    Asi, Hilal
    Chadha, Karan
    Cheng, Gary
    Duchi, John C.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [50] Stochastic Subspace Cubic Newton Method
    Hanzely, Filip
    Doikov, Nikita
    Richtarik, Peter
    Nesterov, Yurii
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119