Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers

被引:6
|
作者
Paquin, Alexandre Lemire [1 ]
Chaib-draa, Brahim [1 ]
Giguere, Philippe [1 ]
机构
[1] Laval Univ, Dept Comp Sci & Software Engn, Pavillon Adrien Pouliot 1065,Ave Med, Quebec City, PQ G1V 0A6, Canada
关键词
Generalization; Deep learning; Stochastic gradient descent; Stability;
D O I
10.1016/j.neunet.2023.04.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We prove new generalization bounds for stochastic gradient descent when training classifiers with invariances. Our analysis is based on the stability framework and covers both the convex case of linear classifiers and the non-convex case of homogeneous neural networks. We analyze stability with respect to the normalized version of the loss function used for training. This leads to investigating a form of angle-wise stability instead of euclidean stability in weights. For neural networks, the measure of distance we consider is invariant to rescaling the weights of each layer. Furthermore, we exploit the notion of on-average stability in order to obtain a data-dependent quantity in the bound. This data-dependent quantity is seen to be more favorable when training with larger learning rates in our numerical experiments. This might help to shed some light on why larger learning rates can lead to better generalization in some practical scenarios.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:382 / 394
页数:13
相关论文
共 50 条
  • [21] Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks
    Nitta, Tohru
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 193 - 198
  • [22] Implicit Stochastic Gradient Descent for Training Physics-Informed Neural Networks
    Li, Ye
    Chen, Song-Can
    Huang, Sheng-Jun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8692 - 8700
  • [23] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [24] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
    KINDERMANN, J
    LINDEN, A
    PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
  • [25] Gradient Descent for Spiking Neural Networks
    Huh, Dongsung
    Sejnowski, Terrence J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [26] Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
    Jentzen, Arnulf
    Welti, Timo
    APPLIED MATHEMATICS AND COMPUTATION, 2023, 455
  • [27] Towards stability and optimality in stochastic gradient descent
    Toulis, Panos
    Tran, Dustin
    Airoldi, Edoardo M.
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1290 - 1298
  • [28] Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel
    Richards, Dominic
    Kuzborskij, Ilja
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [29] Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel
    Richards, Dominic
    Kuzborskij, Ilja
    Advances in Neural Information Processing Systems, 2021, 11 : 8609 - 8621
  • [30] Stability and Generalization of Decentralized Stochastic Gradient Descent
    Sun, Tao
    Li, Dongsheng
    Wang, Bao
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9756 - 9764