First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

被引:0
|
作者
Thanh Huy Nguyen [1 ]
Simsekli, Umut [1 ,2 ]
Gurbuzbalaban, Mert [3 ]
Richard, Gael [1 ]
机构
[1] Telecom Paris, Inst Polytech Paris, LTCI, Paris, France
[2] Univ Oxford, Dept Stat, Oxford, England
[3] Rutgers Business Sch, Dept Management Sci & Informat Syst, New Brunswick, NJ USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷
关键词
SDES DRIVEN; DIFFERENTIAL-EQUATIONS; LEVY; UNIQUENESS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using alpha-stable distributions, a family of heavy-tailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a Levy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of 'preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuous-time limit. Intuitively, the behaviors of these two systems are expected to be similar to each other only when the discretization step is sufficiently small; however, to the best of our knowledge, there is no theoretical understanding on how small the step-size should be chosen in order to guarantee that the discretized system inherits the properties of the continuous-time system. In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit. We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters. We illustrate our results with simulations on a synthetic model and neural networks.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems
    Puchkin, Nikita
    Gorbunov, Eduard
    Kutuzov, Nikolay
    Gasnikov, Alexander
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [22] ASYMPTOTIC FIRST EXIT TIMES OF THE CHAFEE-INFANTE EQUATION WITH SMALL HEAVY-TAILED LEVY NOISE
    Debussche, Arnaud
    Hoegele, Michael
    Imkeller, Peter
    ELECTRONIC COMMUNICATIONS IN PROBABILITY, 2011, 16 : 213 - 225
  • [23] Generalization Bounds for Label Noise Stochastic Gradient Descent
    Huh, Jung Eun
    Rebeschini, Patrick
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [24] Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points
    Dixit, Rishabh
    Gurbuzbalaban, Mert
    Bajwa, Waheed U.
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2023, 12 (02) : 714 - 786
  • [25] Modeling and linearization of systems under heavy-tailed stochastic noise with application to renewable energy assessment
    Kashima, Kenji
    Aoyama, Hiroki
    Ohta, Yoshito
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 1852 - 1857
  • [26] Stochastic resonance with multiplicative heavy-tailed Levy noise: Optimal tuning on an algebraic time scale
    Kuhwald, Isabelle
    Pavlyukevich, Ilya
    STOCHASTICS AND DYNAMICS, 2017, 17 (04)
  • [27] Detecting multifractal stochastic processes under heavy-tailed effects
    Grahovac, Danijel
    Leonenko, Nikolai N.
    CHAOS SOLITONS & FRACTALS, 2014, 65 : 78 - 89
  • [28] Convergence Analysis of Accelerated Stochastic Gradient Descent Under the Growth Condition
    Chen, You-Lin
    Na, Sen
    Kolar, Mladen
    MATHEMATICS OF OPERATIONS RESEARCH, 2024, 49 (04) : 2492 - 2526
  • [29] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [30] Error Analysis of Stochastic Gradient Descent Ranking
    Chen, Hong
    Tang, Yi
    Li, Luoqing
    Yuan, Yuan
    Li, Xuelong
    Tang, Yuanyan
    IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (03) : 898 - 909