CONVERGENCE AND DYNAMICAL BEHAVIOR OF THE ADAM ALGORITHM FOR NONCONVEX STOCHASTIC OPTIMIZATION

被引：52

作者：

Barakat, Anas ^{[1
]}

Bianchi, Pascal ^{[1
]}

机构：

[1] Inst Polytech Paris, Telecom Paris, LTCI, F-91120 Palaiseau, France

来源：

SIAM JOURNAL ON OPTIMIZATION | 2021年 / 31卷 / 01期

关键词：

stochastic approximation; dynamical systems; adaptive gradient methods;

D O I：

10.1137/19M1263443

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Adam is a popular variant of stochastic gradient descent for finding a local minimizer of a function. In the constant stepsize regime, assuming that the objective function is differentiable and nonconvex, we establish the convergence in the long run of the iterates to a stationary point under a stability condition. The key ingredient is the introduction of a continuous-time version of Adam, under the form of a nonautonomous ordinary differential equation. This continuous-time system is a relevant approximation of the Adam iterates, in the sense that the interpolated Adam process converges weakly toward the solution to the ODE. The existence and the uniqueness of the solution are established. We further show the convergence of the solution toward the critical points of the objective function and quantify its convergence rate under a Lojasiewicz assumption. Then, we introduce a novel decreasing stepsize version of Adam. Under mild assumptions, it is shown that the iterates are almost surely bounded and converge almost surely to critical points of the objective function. Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.

引用

页码：244 / 274

页数：31

共 50 条

[1] Stochastic subgradient algorithm for nonsmooth nonconvex optimization
Yalcin, Gulcin Dinc
JOURNAL OF APPLIED MATHEMATICS AND COMPUTING, 2024, 70 (01) : 317 - 334
[2] Stochastic subgradient algorithm for nonsmooth nonconvex optimization
Gulcin Dinc Yalcin
Journal of Applied Mathematics and Computing, 2024, 70 : 317 - 334
[3] On the Convergence Properties of a K-Step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization
Zhou, Fan
Cong, Guojing
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3219 - 3227
[4] Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization
Kiwiel, Krzysztof C.
SIAM JOURNAL ON OPTIMIZATION, 2007, 18 (02) : 379 - 388
[5] On Global Linear Convergence in Stochastic Nonconvex Optimization for Semidefinite Programming
Zeng, Jinshan
Ma, Ke
Yao, Yuan
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (16) : 4261 - 4275
[6] Convergence of the proximal bundle algorithm for nonsmooth nonconvex optimization problems
Monjezi, N. Hoseini
Nobakhtian, S.
OPTIMIZATION LETTERS, 2022, 16 (05) : 1495 - 1511
[7] New Convergence Analysis of the BEER Algorithm in Decentralized Nonconvex Optimization
Tran Thi Phuong
Le Trieu Phong
COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2024, 2024, 87 : 219 - 230
[8] Convergence of the proximal bundle algorithm for nonsmooth nonconvex optimization problems
N. Hoseini Monjezi
S. Nobakhtian
Optimization Letters, 2022, 16 : 1495 - 1511
[9] On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization
Andre Milzarek
Xiantao Xiao
Zaiwen Wen
Michael Ulbrich
Science China(Mathematics), 2022, 65 (10) : 2151 - 2170
[10] Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization
Ruszczynski, Andrzej
OPTIMIZATION LETTERS, 2020, 14 (07) : 1615 - 1625

← 1 2 3 4 5 →