Generalization Bounds for Label Noise Stochastic Gradient Descent

被引：0

作者：

Huh, Jung Eun ^{[1
]}

Rebeschini, Patrick ^{[1
]}

机构：

[1] Univ Oxford, Dept Stat, Oxford, England

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

关键词：

STABILITY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension d. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with d and with the rate of n(-2/3), where n is the sample size. This rate is better than the best-known rate of n(-1/2) established for stochastic gradient Langevin dynamics (SGLD)-which employs parameter-independent Gaussian noise-under similar conditions. Our analysis offers quantitative insights into the effect of label noise.

引用

页数：26

共 50 条

[31] Stochastic gradient descent tricks
Bottou, Léon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
[32] SURVEY DESCENT: A MULTIPOINT GENERALIZATION OF GRADIENT DESCENT FOR NONSMOOTH OPTIMIZATION
Han, X. Y.
Lewis, Adrian S.
SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (01) : 36 - 62
[33] Byzantine Stochastic Gradient Descent
Alistarh, Dan
Allen-Zhu, Zeyuan
Li, Jerry
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[34] NOISE REDUCTION BY GRADIENT DESCENT
Davies, Mike
INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 1993, 3 (01): : 113 - 118
[35] Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
Li, Mingchen
Soltanolkotabi, Mahdi
Oymak, Samet
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4313 - 4324
[36] First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
Thanh Huy Nguyen
Simsekli, Umut
Gurbuzbalaban, Mert
Richard, Gael
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[37] Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Xie, Zeke
Yuan, Li
Zhu, Zhanxing
Sugiyama, Masashi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[38] Path Length Bounds for Gradient Descent and Flow
Gupta, Chirag
Balakrishnan, Sivaraman
Ramdas, Aaditya
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[39] Path length bounds for gradient descent and flow
Gupta, Chirag
Balakrishnan, Sivaraman
Ramdas, Aaditya
Journal of Machine Learning Research, 2021, 22
[40] Convergence of Stochastic Gradient Descent for PCA
Shamir, Ohad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48

← 1 2 3 4 5 →