Generalization Bounds for Label Noise Stochastic Gradient Descent

被引:0
|
作者
Huh, Jung Eun [1 ]
Rebeschini, Patrick [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford, England
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
STABILITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension d. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with d and with the rate of n(-2/3), where n is the sample size. This rate is better than the best-known rate of n(-1/2) established for stochastic gradient Langevin dynamics (SGLD)-which employs parameter-independent Gaussian noise-under similar conditions. Our analysis offers quantitative insights into the effect of label noise.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Stochastic gradient descent tricks
    Bottou, Léon
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
  • [32] SURVEY DESCENT: A MULTIPOINT GENERALIZATION OF GRADIENT DESCENT FOR NONSMOOTH OPTIMIZATION
    Han, X. Y.
    Lewis, Adrian S.
    SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (01) : 36 - 62
  • [33] Byzantine Stochastic Gradient Descent
    Alistarh, Dan
    Allen-Zhu, Zeyuan
    Li, Jerry
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [34] NOISE REDUCTION BY GRADIENT DESCENT
    Davies, Mike
    INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 1993, 3 (01): : 113 - 118
  • [35] Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
    Li, Mingchen
    Soltanolkotabi, Mahdi
    Oymak, Samet
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4313 - 4324
  • [36] First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
    Thanh Huy Nguyen
    Simsekli, Umut
    Gurbuzbalaban, Mert
    Richard, Gael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [37] Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
    Xie, Zeke
    Yuan, Li
    Zhu, Zhanxing
    Sugiyama, Masashi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] Path Length Bounds for Gradient Descent and Flow
    Gupta, Chirag
    Balakrishnan, Sivaraman
    Ramdas, Aaditya
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [39] Path length bounds for gradient descent and flow
    Gupta, Chirag
    Balakrishnan, Sivaraman
    Ramdas, Aaditya
    Journal of Machine Learning Research, 2021, 22
  • [40] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48