CONVERGENCE AND DYNAMICAL BEHAVIOR OF THE ADAM ALGORITHM FOR NONCONVEX STOCHASTIC OPTIMIZATION

被引:52
|
作者
Barakat, Anas [1 ]
Bianchi, Pascal [1 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, F-91120 Palaiseau, France
关键词
stochastic approximation; dynamical systems; adaptive gradient methods;
D O I
10.1137/19M1263443
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Adam is a popular variant of stochastic gradient descent for finding a local minimizer of a function. In the constant stepsize regime, assuming that the objective function is differentiable and nonconvex, we establish the convergence in the long run of the iterates to a stationary point under a stability condition. The key ingredient is the introduction of a continuous-time version of Adam, under the form of a nonautonomous ordinary differential equation. This continuous-time system is a relevant approximation of the Adam iterates, in the sense that the interpolated Adam process converges weakly toward the solution to the ODE. The existence and the uniqueness of the solution are established. We further show the convergence of the solution toward the critical points of the objective function and quantify its convergence rate under a Lojasiewicz assumption. Then, we introduce a novel decreasing stepsize version of Adam. Under mild assumptions, it is shown that the iterates are almost surely bounded and converge almost surely to critical points of the objective function. Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.
引用
收藏
页码:244 / 274
页数:31
相关论文
共 50 条
  • [31] Stochastic Gauss-Newton algorithm with STORM estimators for nonconvex composite optimization
    Wang, Zhaoxin
    Wen, Bo
    JOURNAL OF APPLIED MATHEMATICS AND COMPUTING, 2022, 68 (06) : 4621 - 4643
  • [32] A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization
    Xie, Antai
    Yi, Xinlei
    Wang, Xiaofan
    Cao, Ming
    Ren, Xiaoqiang
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA 2024, 2024, : 609 - 614
  • [33] Duality gaps in nonconvex stochastic optimization
    Darinka Dentcheva
    Werner Römisch
    Mathematical Programming, 2004, 101 : 515 - 535
  • [34] Stochastic Variance Reduction for Nonconvex Optimization
    Reddi, Sashank J.
    Hefny, Ahmed
    Sra, Suvrit
    Poczos, Barnabas
    Smola, Alex
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [35] Duality gaps in nonconvex stochastic optimization
    Dentcheva, D
    Römisch, W
    MATHEMATICAL PROGRAMMING, 2004, 101 (03) : 515 - 535
  • [36] Distributed stochastic nonsmooth nonconvex optimization
    Kungurtsev, Vyacheslav
    OPERATIONS RESEARCH LETTERS, 2022, 50 (06) : 627 - 631
  • [37] Nonconvex Stochastic Optimization for Model Reduction
    Han-Fu Chen
    Hai-Tao Fang
    Journal of Global Optimization, 2002, 23 : 359 - 372
  • [38] Stochastic Nonconvex Optimization with Large Minibatches
    Wang, Weiran
    Srebro, Nathan
    ALGORITHMIC LEARNING THEORY, VOL 98, 2019, 98
  • [39] Nonconvex stochastic optimization for model reduction
    Chen, HF
    Fang, HT
    JOURNAL OF GLOBAL OPTIMIZATION, 2002, 23 (3-4) : 359 - 372
  • [40] On the Global Convergence of Continuous-Time Stochastic Heavy-Ball Method for Nonconvex Optimization
    Hu, Wenqing
    Li, Chris Junchi
    Zhou, Xiang
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 94 - 104