On Early Stopping in Gradient Descent Learning

被引:1
|
作者
Yuan Yao
Lorenzo Rosasco
Andrea Caponnetto
机构
[1] Department of Mathematics,
[2] University of California,undefined
[3] C.B.C.L.,undefined
[4] Massachusetts Institute of Technology,undefined
[5] Bldg. E25-201,undefined
[6] 45 Carleton St.,undefined
[7] DISI,undefined
[8] Universita di Genova,undefined
[9] Via Dodecaneso 35,undefined
来源
关键词
Convergence Rate; Gradient Descent; Tikhonov Regularization; Reproduce Kernel Hilbert Space; Gradient Descent Method;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we study a family of gradient descent algorithms to approximate the regression function from reproducing kernel Hilbert spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. We also discuss the implication of these results in the context of classification where some fast convergence rates can be achieved for plug-in classifiers. Some connections are addressed with Boosting, Landweber iterations, and the online learning algorithms as stochastic approximations of the gradient descent method.
引用
收藏
页码:289 / 315
页数:26
相关论文
共 50 条
  • [1] On early stopping in gradient descent learning
    Yao, Yuan
    Rosasco, Lorenzo
    Caponnetto, Andrea
    CONSTRUCTIVE APPROXIMATION, 2007, 26 (02) : 289 - 315
  • [2] Learning gradients via an early stopping gradient descent method
    Guo, Xin
    JOURNAL OF APPROXIMATION THEORY, 2010, 162 (11) : 1919 - 1944
  • [3] ESTIMATION IN LINEAR-MODELS USING GRADIENT DESCENT WITH EARLY STOPPING
    SKOURAS, K
    GOUTIS, C
    BRAMSON, MJ
    STATISTICS AND COMPUTING, 1994, 4 (04) : 271 - 278
  • [4] Fast robust kernel regression through sign gradient descent with early stopping
    Allerbo, Oskar
    ELECTRONIC JOURNAL OF STATISTICS, 2025, 19 (01): : 1231 - 1285
  • [5] Learning to learn by gradient descent by gradient descent
    Andrychowicz, Marcin
    Denil, Misha
    Colmenarejo, Sergio Gomez
    Hoffman, Matthew W.
    Pfau, David
    Schaul, Tom
    Shillingford, Brendan
    de Freitas, Nando
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [6] Learning to Learn without Gradient Descent by Gradient Descent
    Chen, Yutian
    Hoffman, Matthew W.
    Colmenarejo, Sergio Gomez
    Denil, Misha
    Lillicrap, Timothy P.
    Botvinick, Matt
    de Freitas, Nando
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [7] Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
    Li, Mingchen
    Soltanolkotabi, Mahdi
    Oymak, Samet
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4313 - 4324
  • [8] LEARNING BY ONLINE GRADIENT DESCENT
    BIEHL, M
    SCHWARZE, H
    JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1995, 28 (03): : 643 - 656
  • [9] Learning Fractals by Gradient Descent
    Tu, Cheng-Hao
    Chen, Hong-You
    Carlyn, David
    Chao, Wei-Lun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2456 - 2464
  • [10] Gradient Descent Learning With Floats
    Sun, Tao
    Tang, Ke
    Li, Dongsheng
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (03) : 1763 - 1771