On Generalization Bounds for Deep Networks Based on Loss Surface Implicit Regularization

被引:1
|
作者
Imaizumi, Masaaki [1 ]
Schmidt-Hieber, Johannes [2 ]
机构
[1] Univ Tokyo, Komaba Inst Sci, Tokyo 1530021, Japan
[2] Univ Twente, Dept Appl Math, NL-7522 NB Enschede, Netherlands
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
Neural networks; Deep learning; Statistics; Sociology; Convergence; Complexity theory; Training data; Deep neural network; generalization error; uniform convergence; non-convex optimization; NEURAL-NETWORKS; REGRESSION;
D O I
10.1109/TIT.2022.3215088
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.
引用
收藏
页码:1203 / 1223
页数:21
相关论文
共 50 条
  • [1] Generalization Bounds of Deep Neural Networks With τ -Mixing Samples
    Liu, Liyuan
    Chen, Yaohui
    Li, Weifu
    Wang, Yingjie
    Gu, Bin
    Zheng, Feng
    Chen, Hong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025,
  • [2] Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
    Zhao, Dan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] The Weights Reset Technique for Deep Neural Networks Implicit Regularization
    Plusch, Grigoriy
    Arsenyev-Obraztsov, Sergey
    Kochueva, Olga
    COMPUTATION, 2023, 11 (08)
  • [4] Generalization Bounds Derived IPM-Based Regularization for Domain Adaptation
    Meng, Juan
    Hu, Guyu
    Li, Dong
    Zhang, Yanyan
    Pan, Zhisong
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016
  • [5] Generalization Bounds of Regularization Algorithm with Gaussian Kernels
    Feilong Cao
    Yufang Liu
    Weiguo Zhang
    Neural Processing Letters, 2014, 39 : 179 - 194
  • [6] Generalization Bounds of Regularization Algorithm with Gaussian Kernels
    Cao, Feilong
    Liu, Yufang
    Zhang, Weiguo
    NEURAL PROCESSING LETTERS, 2014, 39 (02) : 179 - 194
  • [7] LOss-Based SensiTivity rEgulaRization: Towards deep sparse neural networks
    Tartaglione, Enzo
    Bragagnolo, Andrea
    Fiandrotti, Attilio
    Grangetto, Marco
    NEURAL NETWORKS, 2022, 146 : 230 - 237
  • [8] Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
    Razin, Noam
    Maman, Asaf
    Cohen, Nadav
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [9] Deep Implicit Surface Point Prediction Networks
    Venkatesh, Rahul
    Karmali, Tejan
    Sharma, Sarthak
    Ghosh, Aurobrata
    Babu, R. Venkatesh
    Jeni, Laszlo A.
    Singh, Maneesh
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12633 - 12642
  • [10] Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks
    Frei, Spencer
    Cao, Yuan
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32