On Generalization Bounds for Deep Networks Based on Loss Surface Implicit Regularization

被引：1

作者：

Imaizumi, Masaaki ^{[1
]}

Schmidt-Hieber, Johannes ^{[2
]}

机构：

[1] Univ Tokyo, Komaba Inst Sci, Tokyo 1530021, Japan

[2] Univ Twente, Dept Appl Math, NL-7522 NB Enschede, Netherlands

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2023年 / 69卷 / 02期

基金：

日本学术振兴会; 日本科学技术振兴机构;

关键词：

Neural networks; Deep learning; Statistics; Sociology; Convergence; Complexity theory; Training data; Deep neural network; generalization error; uniform convergence; non-convex optimization; NEURAL-NETWORKS; REGRESSION;

D O I：

10.1109/TIT.2022.3215088

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.

引用

页码：1203 / 1223

页数：21

共 50 条

[31] Implicit Regularization with Polynomial Growth in Deep Tensor Factorization
Hariz, Kais
Kadri, Hachem
Ayache, Stephane
Moakher, Maher
Artieres, Thierry
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[32] Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks
Cao, Yuan
Gu, Quanquan
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3349 - 3356
[33] Loss Function Learning for Domain Generalization by Implicit Gradient
Gao, Boyan
Gouk, Henry
Yang, Yongxin
Hospedales, Timothy
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[34] An adaptive Drop method for deep neural networks regularization: Estimation of DropConnect hyperparameter using generalization gap
Hssayni, El Houssaine
Joudar, Nour-Eddine
Ettaouil, Mohamed
KNOWLEDGE-BASED SYSTEMS, 2022, 253
[35] Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
Xu, Mengji
Rangamani, Akshay
Liao, Qianli
Galanti, Tomer
Poggio, Tomaso
RESEARCH, 2023, 6
[36] Implicit Regularization Towards Rank Minimization in ReLU Networks
Timor, Nadav
Vardi, Gal
Shamir, Ohad
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 1429 - 1459
[37] Generalization Error Bounds on Deep Learning with Markov Datasets
Truong, Lan V.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[38] Deep Heterogeneous Graph Neural Networks via Similarity Regularization Loss and Hierarchical Fusion
Xiong, Zhilong
Cai, Jia
2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 759 - 768
[39] ADAPTIVE BOUNDS FOR QUADRIC BASED GENERALIZATION
Dayal, Abhinav
2009 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-5, 2009, : 3489 - 3492
[40] Threshout Regularization for Deep Neural Networks
Williams, Travis
Li, Robert
SOUTHEASTCON 2021, 2021, : 728 - 735

← 1 2 3 4 5 →