Strong error analysis for stochastic gradient descent optimization algorithms

被引:13
|
作者
Jentzen, Arnulf [1 ]
Kuckuck, Benno [1 ]
Neufeld, Ariel [2 ]
von Wurstemberger, Philippe [3 ]
机构
[1] Univ Munster, Fac Math & Comp Sci, D-48149 Munster, Germany
[2] NTU Singapore, Div Math Sci, Singapore 637371, Singapore
[3] Swiss Fed Inst Technol, Dept Math, CH-8092 Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Stochastic gradient descent; Stochastic approximation algorithms; Strong error analysis; CONVERGENCE RATE; ROBBINS-MONRO; APPROXIMATION; MOMENTS; RATES;
D O I
10.1093/imanum/drz055
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small epsilon is an element of (0,infinity) and every arbitrarily large p epsilon (0,infinity) that the considered SGD optimization algorithm converges in the strong L-p-sense with order 1/2-epsilon to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large p epsilon (0,infinity) strong L-p-convergence rates.
引用
收藏
页码:455 / 492
页数:38
相关论文
共 50 条
  • [22] Comparison of the Stochastic Gradient Descent Based Optimization Techniques
    Yazan, Ersan
    Talu, M. Fatih
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [23] Optimization of Gradient Descent Parameters in Attitude Estimation Algorithms
    Sever, Karla
    Golusin, Leonardo Max
    Loncar, Josip
    SENSORS, 2023, 23 (04)
  • [24] BAYESIAN STOCHASTIC GRADIENT DESCENT FOR STOCHASTIC OPTIMIZATION WITH STREAMING INPUT DATA
    Liu, Tianyi
    Lin, Yifan
    Zhou, Enlu
    SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) : 389 - 418
  • [25] Analysis of stochastic gradient descent in continuous time
    Jonas Latz
    Statistics and Computing, 2021, 31
  • [26] Analysis of stochastic gradient descent in continuous time
    Latz, Jonas
    STATISTICS AND COMPUTING, 2021, 31 (04)
  • [27] STRONG CONSISTENCY OF A CLASS OF RECURSIVE STOCHASTIC GRADIENT ALGORITHMS
    HERSH, MA
    ZARROP, MB
    INTERNATIONAL JOURNAL OF CONTROL, 1986, 43 (04) : 1115 - 1123
  • [28] Analysis of the class of complex-valued error adaptive normalised nonlinear gradient descent algorithms
    Hanna, AI
    Yates, I
    Mandic, DP
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 705 - 708
  • [29] Study on Stochastic Gradient Descent Without Explicit Error Backpropagation with Momentum
    Mahboubi, Shahrzad
    Ninomiya, Hiroshi
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1357 - 1358
  • [30] Adaptive Alternating Stochastic Gradient Descent Algorithms for Large-Scale Latent Factor Analysis
    Qin, Wen
    Luo, Xin
    Zhou, MengChu
    2021 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2021), 2021, : 285 - 290