Gradient methods never overfit on separable data

被引:0
|
作者
Shamir, Ohad [1 ]
机构
[1] Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
基金
欧洲研究理事会;
关键词
Optimization - Stochastic systems - Large dataset;
D O I
暂无
中图分类号
学科分类号
摘要
A line of recent works established that when training linear predictors over separable data, using gradient methods and exponentially-tailed losses, the predictors asymptotically converge in direction to the max-margin predictor. As a consequence, the predictors asymptotically do not overfit. However, this does not address the question of whether overfitting might occur non-asymptotically, after some bounded number of iterations. In this paper, we formally show that standard gradient methods (in particular, gradient flow, gradient descent and stochastic gradient descent) never overfit on separable data: If we run these methods for T iterations on a dataset of size m, both the empirical risk and the generalization error decrease at an essentially optimal rate of Õ(1/γ2T) up till T ≈ m, at which point the generalization error remains fixed at an essentially optimal level of Õ(1/γ2m) regardless of how large T is. Along the way, we present non-asymptotic bounds on the number of margin violations over the dataset, and prove their tightness. © 2021 Ohad Shamir.
引用
收藏
相关论文
共 50 条
  • [22] Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data
    Wang, Xiaolu
    Jin, Cheng
    Wai, Hoi-To
    Gu, Yuantao
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4314 - 4319
  • [23] Data Never Lie
    Kataria, Yachana
    CLINICAL CHEMISTRY, 2018, 64 (08) : 1268 - 1269
  • [24] RAYS IN GRADIENT-INDEX MEDIA - SEPARABLE SYSTEMS
    BUCHDAHL, HA
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1973, 63 (01) : 46 - 49
  • [25] A coordinate gradient descent method for nonsmooth separable minimization
    Paul Tseng
    Sangwoon Yun
    Mathematical Programming, 2009, 117 : 387 - 423
  • [26] A coordinate gradient descent method for nonsmooth separable minimization
    Tseng, Paul
    Yun, Sangwoon
    MATHEMATICAL PROGRAMMING, 2009, 117 (1-2) : 387 - 423
  • [27] Comparison of methods for extracting linear solvent strength gradient parameters from gradient chromatographic data
    Ford, JC
    Ko, J
    JOURNAL OF CHROMATOGRAPHY A, 1996, 727 (01) : 1 - 11
  • [28] Regularization methods for separable nonlinear models
    Chen, Guang-Yong
    Wang, Shu-Qiang
    Wang, Dong-Qing
    Gan, Min
    NONLINEAR DYNAMICS, 2019, 98 (02) : 1287 - 1298
  • [29] MULTIPOINT METHODS FOR SEPARABLE NONLINEAR NETWORKS
    KAMESAM, PV
    MEYER, RR
    MATHEMATICAL PROGRAMMING STUDY, 1984, 22 (DEC): : 185 - 205
  • [30] Regularization methods for separable nonlinear models
    Guang-Yong Chen
    Shu-Qiang Wang
    Dong-Qing Wang
    Min Gan
    Nonlinear Dynamics, 2019, 98 : 1287 - 1298