Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

被引：0

作者：

Nicolas Loizou

Peter Richtárik

机构：

[1] Université de Montréal,Mila and DIRO

[2] King Abdullah University of Science and Technology (KAUST),undefined

来源：

Computational Optimization and Applications | 2020年 / 77卷

关键词：

Stochastic methods; Heavy ball momentum; Linear systems; Randomized coordinate descent; Randomized Kaczmarz; Stochastic gradient descent; Stochastic Newton; Quadratic optimization; Convex optimization; 68Q25; 68W20; 68W40; 65Y20; 90C15; 90C20; 90C25; 15A06; 15B52; 65F10;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesàro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

引用

页码：653 / 710

页数：57

共 50 条

[1] Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
Loizou, Nicolas
Richtarik, Peter
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2020, 77 (03) : 653 - 710
[2] On the Hyperparameters in Stochastic Gradient Descent with Momentum
Shi, Bin
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[3] On the Generalization of Stochastic Gradient Descent with Momentum
Ramezani-Kebrya, Ali
Antonakopoulos, Kimon
Cevher, Volkan
Khisti, Ashish
Liang, Ben
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
[4] On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum
Gao, Hongchang
Li, Junyi
Huang, Heng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Convergence of Momentum-Based Stochastic Gradient Descent
Jin, Ruinan
He, Xingkang
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 779 - 784
[6] Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent
Wang, Bao
Nguyen, Tan
Sun, Tao
Bertozzi, Andrea L.
Baraniuk, Richard G.
Osher, Stanley J.
SIAM JOURNAL ON IMAGING SCIENCES, 2022, 15 (02): : 738 - 761
[7] Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum
Chee, Jerry
Li, Ping
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 133 - 140
[8] ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
Srinivasan, Vishwak
Sankar, Adepu Ravi
Balasubramanian, Vineeth N.
PROCEEDINGS OF THE ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA (CODS-COMAD'18), 2018, : 249 - 256
[9] Understanding the Role of Momentum in Stochastic Gradient Methods
Gitman, Igor
Lang, Hunter
Zhang, Pengchuan
Xiao, Lin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[10] The combination of particle swarm optimization and stochastic gradient descent with momentum
Chen, Chi-Hua
ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2022, 18 : 132 - 132

← 1 2 3 4 5 →