On Generalization Bounds for Deep Networks Based on Loss Surface Implicit Regularization

被引:1
|
作者
Imaizumi, Masaaki [1 ]
Schmidt-Hieber, Johannes [2 ]
机构
[1] Univ Tokyo, Komaba Inst Sci, Tokyo 1530021, Japan
[2] Univ Twente, Dept Appl Math, NL-7522 NB Enschede, Netherlands
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
Neural networks; Deep learning; Statistics; Sociology; Convergence; Complexity theory; Training data; Deep neural network; generalization error; uniform convergence; non-convex optimization; NEURAL-NETWORKS; REGRESSION;
D O I
10.1109/TIT.2022.3215088
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.
引用
收藏
页码:1203 / 1223
页数:21
相关论文
共 50 条
  • [41] Deep Autoencoders and Feedforward Networks Based on a New Regularization for Anomaly Detection
    Albahar, Marwan Ali
    Binsawad, Muhammad
    SECURITY AND COMMUNICATION NETWORKS, 2020, 2020
  • [42] Combining of Multiple Deep Networks via Ensemble Generalization Loss, Based on MRI Images, for Alzheimer's Disease Classification
    Choi, Jae Young
    Lee, Bumshik
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 206 - 210
  • [43] On Chow-Liu Forest Based Regularization of Deep Belief Networks
    Sarishvili, Alex
    Wirsen, Andreas
    Jirstrand, Mats
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 353 - 364
  • [44] Relative deviation learning bounds and generalization with unbounded loss functions
    Corinna Cortes
    Spencer Greenberg
    Mehryar Mohri
    Annals of Mathematics and Artificial Intelligence, 2019, 85 : 45 - 70
  • [45] Analytic Marching: An Analytic Meshing Solution from Deep Implicit Surface Networks
    Lei, Jiabao
    Jia, Kui
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [46] Relative deviation learning bounds and generalization with unbounded loss functions
    Cortes, Corinna
    Greenberg, Spencer
    Mohri, Mehryar
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2019, 85 (01) : 45 - 70
  • [47] Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
    Martin, Charles H.
    Mahoney, Michael W.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [48] Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning
    Martin, Charles H.
    Mahoney, Michael W.
    1600, Microtome Publishing (22):
  • [49] ON THE TRAINING AND GENERALIZATION OF DEEP OPERATOR NETWORKS
    Lee, Sanghyun
    Shin, Yeonjong
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2024, 46 (04): : C273 - C296
  • [50] Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks
    Gidel, Gauthier
    Bach, Francis
    Lacoste-Julien, Simon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32