On Generalization Bounds for Deep Networks Based on Loss Surface Implicit Regularization

被引:1
|
作者
Imaizumi, Masaaki [1 ]
Schmidt-Hieber, Johannes [2 ]
机构
[1] Univ Tokyo, Komaba Inst Sci, Tokyo 1530021, Japan
[2] Univ Twente, Dept Appl Math, NL-7522 NB Enschede, Netherlands
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
Neural networks; Deep learning; Statistics; Sociology; Convergence; Complexity theory; Training data; Deep neural network; generalization error; uniform convergence; non-convex optimization; NEURAL-NETWORKS; REGRESSION;
D O I
10.1109/TIT.2022.3215088
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.
引用
收藏
页码:1203 / 1223
页数:21
相关论文
共 50 条
  • [31] Implicit Regularization with Polynomial Growth in Deep Tensor Factorization
    Hariz, Kais
    Kadri, Hachem
    Ayache, Stephane
    Moakher, Maher
    Artieres, Thierry
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [32] Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks
    Cao, Yuan
    Gu, Quanquan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3349 - 3356
  • [33] Loss Function Learning for Domain Generalization by Implicit Gradient
    Gao, Boyan
    Gouk, Henry
    Yang, Yongxin
    Hospedales, Timothy
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [34] An adaptive Drop method for deep neural networks regularization: Estimation of DropConnect hyperparameter using generalization gap
    Hssayni, El Houssaine
    Joudar, Nour-Eddine
    Ettaouil, Mohamed
    KNOWLEDGE-BASED SYSTEMS, 2022, 253
  • [35] Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
    Xu, Mengji
    Rangamani, Akshay
    Liao, Qianli
    Galanti, Tomer
    Poggio, Tomaso
    RESEARCH, 2023, 6
  • [36] Implicit Regularization Towards Rank Minimization in ReLU Networks
    Timor, Nadav
    Vardi, Gal
    Shamir, Ohad
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 1429 - 1459
  • [37] Generalization Error Bounds on Deep Learning with Markov Datasets
    Truong, Lan V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [38] Deep Heterogeneous Graph Neural Networks via Similarity Regularization Loss and Hierarchical Fusion
    Xiong, Zhilong
    Cai, Jia
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 759 - 768
  • [39] ADAPTIVE BOUNDS FOR QUADRIC BASED GENERALIZATION
    Dayal, Abhinav
    2009 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-5, 2009, : 3489 - 3492
  • [40] Threshout Regularization for Deep Neural Networks
    Williams, Travis
    Li, Robert
    SOUTHEASTCON 2021, 2021, : 728 - 735