Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

被引:0
|
作者
Nicolas Loizou
Peter Richtárik
机构
[1] Université de Montréal,Mila and DIRO
[2] King Abdullah University of Science and Technology (KAUST),undefined
关键词
Stochastic methods; Heavy ball momentum; Linear systems; Randomized coordinate descent; Randomized Kaczmarz; Stochastic gradient descent; Stochastic Newton; Quadratic optimization; Convex optimization; 68Q25; 68W20; 68W40; 65Y20; 90C15; 90C20; 90C25; 15A06; 15B52; 65F10;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesàro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.
引用
收藏
页码:653 / 710
页数:57
相关论文
共 50 条
  • [1] Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
    Loizou, Nicolas
    Richtarik, Peter
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2020, 77 (03) : 653 - 710
  • [2] On the Hyperparameters in Stochastic Gradient Descent with Momentum
    Shi, Bin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [3] On the Generalization of Stochastic Gradient Descent with Momentum
    Ramezani-Kebrya, Ali
    Antonakopoulos, Kimon
    Cevher, Volkan
    Khisti, Ashish
    Liang, Ben
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
  • [4] On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum
    Gao, Hongchang
    Li, Junyi
    Huang, Heng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Convergence of Momentum-Based Stochastic Gradient Descent
    Jin, Ruinan
    He, Xingkang
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 779 - 784
  • [6] Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent
    Wang, Bao
    Nguyen, Tan
    Sun, Tao
    Bertozzi, Andrea L.
    Baraniuk, Richard G.
    Osher, Stanley J.
    SIAM JOURNAL ON IMAGING SCIENCES, 2022, 15 (02): : 738 - 761
  • [7] Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum
    Chee, Jerry
    Li, Ping
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 133 - 140
  • [8] ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
    Srinivasan, Vishwak
    Sankar, Adepu Ravi
    Balasubramanian, Vineeth N.
    PROCEEDINGS OF THE ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA (CODS-COMAD'18), 2018, : 249 - 256
  • [9] Understanding the Role of Momentum in Stochastic Gradient Methods
    Gitman, Igor
    Lang, Hunter
    Zhang, Pengchuan
    Xiao, Lin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] The combination of particle swarm optimization and stochastic gradient descent with momentum
    Chen, Chi-Hua
    ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2022, 18 : 132 - 132