Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

被引:0
|
作者
Nicolas Loizou
Peter Richtárik
机构
[1] Université de Montréal,Mila and DIRO
[2] King Abdullah University of Science and Technology (KAUST),undefined
关键词
Stochastic methods; Heavy ball momentum; Linear systems; Randomized coordinate descent; Randomized Kaczmarz; Stochastic gradient descent; Stochastic Newton; Quadratic optimization; Convex optimization; 68Q25; 68W20; 68W40; 65Y20; 90C15; 90C20; 90C25; 15A06; 15B52; 65F10;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesàro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.
引用
收藏
页码:653 / 710
页数:57
相关论文
共 50 条
  • [31] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
    Yangyang Xu
    Yibo Xu
    Journal of Optimization Theory and Applications, 2023, 196 : 266 - 297
  • [32] Momentum-based variance-reduced stochastic Bregman proximal gradient methods for nonconvex nonsmooth optimization
    Liao, Shichen
    Liu, Yan
    Han, Congying
    Guo, Tiande
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [33] Analysis of Weight Initialization Methods for Gradient Descent with momentum
    Masood, Sarfaraz
    Doja, M. N.
    Chandra, Pravin
    2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND IMPLEMENTATIONS (ICSCTI), 2015,
  • [34] Fractional-order stochastic gradient descent method with momentum and energy for deep neural networks
    Zhou, Xingwen
    You, Zhenghao
    Sun, Weiguo
    Zhao, Dongdong
    Yan, Shi
    NEURAL NETWORKS, 2025, 181
  • [35] Stochastic Gradient Descent Method of Convolutional Neural Network Using Fractional-Order Momentum
    Kan T.
    Gao Z.
    Yang C.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (06): : 559 - 567
  • [36] Generative adversarial networks with stochastic gradient descent with momentum algorithm for video-based facial expression
    Cherian A.K.
    Vaidhehi M.
    Arshey M.
    Briskilal J.
    Simpson S.V.
    International Journal of Information Technology, 2024, 16 (6) : 3703 - 3722
  • [37] Adaptive Polyak Step-Size for Momentum Accelerated Stochastic Gradient Descent With General Convergence Guarantee
    Zhang, Jiawei
    Jin, Cheng
    Gu, Yuantao
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 462 - 476
  • [38] Robust Stochastic Gradient Descent With Student-t Distribution Based First-Order Momentum
    Ilboudo, Wendyam Eric Lionel
    Kobayashi, Taisuke
    Sugimoto, Kenji
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (03) : 1324 - 1337
  • [39] Distributed stochastic gradient tracking methods with momentum acceleration for non-convex optimization
    Gao, Juan
    Liu, Xin-Wei
    Dai, Yu-Hong
    Huang, Yakui
    Gu, Junhua
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2023, 84 (02) : 531 - 572
  • [40] Distributed stochastic gradient tracking methods with momentum acceleration for non-convex optimization
    Juan Gao
    Xin-Wei Liu
    Yu-Hong Dai
    Yakui Huang
    Junhua Gu
    Computational Optimization and Applications, 2023, 84 : 531 - 572