On Generalization Bounds for Deep Networks Based on Loss Surface Implicit Regularization

被引：1

作者：

Imaizumi, Masaaki ^{[1
]}

Schmidt-Hieber, Johannes ^{[2
]}

机构：

[1] Univ Tokyo, Komaba Inst Sci, Tokyo 1530021, Japan

[2] Univ Twente, Dept Appl Math, NL-7522 NB Enschede, Netherlands

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2023年 / 69卷 / 02期

基金：

日本学术振兴会; 日本科学技术振兴机构;

关键词：

Neural networks; Deep learning; Statistics; Sociology; Convergence; Complexity theory; Training data; Deep neural network; generalization error; uniform convergence; non-convex optimization; NEURAL-NETWORKS; REGRESSION;

D O I：

10.1109/TIT.2022.3215088

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.

引用

页码：1203 / 1223

页数：21

共 50 条

[1] Generalization Bounds of Deep Neural Networks With τ -Mixing Samples
Liu, Liyuan
Chen, Yaohui
Li, Weifu
Wang, Yingjie
Gu, Bin
Zheng, Feng
Chen, Hong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025,
[2] Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
Zhao, Dan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] The Weights Reset Technique for Deep Neural Networks Implicit Regularization
Plusch, Grigoriy
Arsenyev-Obraztsov, Sergey
Kochueva, Olga
COMPUTATION, 2023, 11 (08)
[4] Generalization Bounds Derived IPM-Based Regularization for Domain Adaptation
Meng, Juan
Hu, Guyu
Li, Dong
Zhang, Yanyan
Pan, Zhisong
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016
[5] Generalization Bounds of Regularization Algorithm with Gaussian Kernels
Feilong Cao
Yufang Liu
Weiguo Zhang
Neural Processing Letters, 2014, 39 : 179 - 194
[6] Generalization Bounds of Regularization Algorithm with Gaussian Kernels
Cao, Feilong
Liu, Yufang
Zhang, Weiguo
NEURAL PROCESSING LETTERS, 2014, 39 (02) : 179 - 194
[7] LOss-Based SensiTivity rEgulaRization: Towards deep sparse neural networks
Tartaglione, Enzo
Bragagnolo, Andrea
Fiandrotti, Attilio
Grangetto, Marco
NEURAL NETWORKS, 2022, 146 : 230 - 237
[8] Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Razin, Noam
Maman, Asaf
Cohen, Nadav
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[9] Deep Implicit Surface Point Prediction Networks
Venkatesh, Rahul
Karmali, Tejan
Sharma, Sarthak
Ghosh, Aurobrata
Babu, R. Venkatesh
Jeni, Laszlo A.
Singh, Maneesh
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12633 - 12642
[10] Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks
Frei, Spencer
Cao, Yuan
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →