Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers

被引：6

作者：

Paquin, Alexandre Lemire ^{[1
]}

Chaib-draa, Brahim ^{[1
]}

Giguere, Philippe ^{[1
]}

机构：

[1] Laval Univ, Dept Comp Sci & Software Engn, Pavillon Adrien Pouliot 1065,Ave Med, Quebec City, PQ G1V 0A6, Canada

来源：

NEURAL NETWORKS | 2023年 / 164卷

关键词：

Generalization; Deep learning; Stochastic gradient descent; Stability;

D O I：

10.1016/j.neunet.2023.04.028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We prove new generalization bounds for stochastic gradient descent when training classifiers with invariances. Our analysis is based on the stability framework and covers both the convex case of linear classifiers and the non-convex case of homogeneous neural networks. We analyze stability with respect to the normalized version of the loss function used for training. This leads to investigating a form of angle-wise stability instead of euclidean stability in weights. For neural networks, the measure of distance we consider is invariant to rescaling the weights of each layer. Furthermore, we exploit the notion of on-average stability in order to obtain a data-dependent quantity in the bound. This data-dependent quantity is seen to be more favorable when training with larger learning rates in our numerical experiments. This might help to shed some light on why larger learning rates can lead to better generalization in some practical scenarios.(c) 2023 Elsevier Ltd. All rights reserved.

引用

页码：382 / 394

页数：13

共 50 条

[11] Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
Zhou, Jingcheng
Wei, Wei
Zhang, Ruizhi
Zheng, Zhiming
MATHEMATICS, 2021, 9 (13)
[12] A Convergence Analysis of Gradient Descent on Graph Neural Networks
Awasthi, Pranjal
Das, Abhimanyu
Gollapudi, Sreenivas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[13] Analysis of natural gradient descent for multilayer neural networks
Rattray, M
Saad, D
PHYSICAL REVIEW E, 1999, 59 (04): : 4523 - 4532
[14] Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks
Byerly, Adam
Kalganova, Tatiana
IEEE ACCESS, 2021, 9 : 48519 - 48530
[15] Stochastic Markov gradient descent and training low-bit neural networks
Ashbrock, Jonathan
Powell, Alexander M.
SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS, 2021, 19 (02):
[16] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Li, Yuanzhi
Liang, Yingyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[17] Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
Morse, Gregory
Stanley, Kenneth O.
GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, : 477 - 484
[18] Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks
Veeriah, Vivek
Zhang, Shangtong
Sutton, Richard S.
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 445 - 459
[19] Non-convergence of stochastic gradient descent in the training of deep neural networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
JOURNAL OF COMPLEXITY, 2021, 64
[20] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
arXiv, 2019,

← 1 2 3 4 5 →