On Generalization Bounds for Deep Networks Based on Loss Surface Implicit Regularization

被引:1
|
作者
Imaizumi, Masaaki [1 ]
Schmidt-Hieber, Johannes [2 ]
机构
[1] Univ Tokyo, Komaba Inst Sci, Tokyo 1530021, Japan
[2] Univ Twente, Dept Appl Math, NL-7522 NB Enschede, Netherlands
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
Neural networks; Deep learning; Statistics; Sociology; Convergence; Complexity theory; Training data; Deep neural network; generalization error; uniform convergence; non-convex optimization; NEURAL-NETWORKS; REGRESSION;
D O I
10.1109/TIT.2022.3215088
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.
引用
收藏
页码:1203 / 1223
页数:21
相关论文
共 50 条
  • [21] Dropout Training, Data-dependent Regularization, and Generalization Bounds
    Mou, Wenlong
    Zhou, Yuchen
    Gao, Jun
    Wang, Liwei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [22] Kernelized Elastic Net Regularization: Generalization Bounds, and Sparse Recovery
    Feng, Yunlong
    Lv, Shao-Gao
    Hang, Hanyuan
    Suykens, Johan A. K.
    NEURAL COMPUTATION, 2016, 28 (03) : 525 - 562
  • [23] Generalization of stochastic-resonance-based threshold networks with Tikhonov regularization
    Bai, Saiya
    Duan, Fabing
    Chapeau-Blondeau, Francois
    Abbott, Derek
    PHYSICAL REVIEW E, 2022, 106 (01)
  • [24] Generalization bounds for pairwise learning with the Huber loss
    Huang, Shouyou
    Zeng, Zhiyi
    Jiang, Siying
    NEUROCOMPUTING, 2025, 622
  • [25] The Loss Surface of Deep and Wide Neural Networks
    Quynh Nguyen
    Hein, Matthias
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [26] Norm Loss: An efficient yet effective regularization method for deep neural networks
    Georgiou, Theodoros
    Schmitt, Sebastian
    Back, Thomas
    Chen, Wei
    Lew, Michael
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8812 - 8818
  • [27] Implicit surface reconstruction with total variation regularization
    Liu, Yuan
    Song, Yanzhi
    Yang, Zhouwang
    Deng, Jiansong
    COMPUTER AIDED GEOMETRIC DESIGN, 2017, 52-53 : 135 - 153
  • [28] On Generalization Bounds of a Family of Recurrent Neural Networks
    Chen, Minshuo
    Li, Xingguo
    Zhao, Tuo
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1233 - 1242
  • [29] Generalization and risk bounds for recurrent neural networks
    Cheng, Xuewei
    Huang, Ke
    Ma, Shujie
    NEUROCOMPUTING, 2025, 616
  • [30] Implicit Regularization in Deep Learning May Not Be Explainable by Norms
    Razin, Noam
    Cohen, Nadav
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33