THE INTERPOLATION PHASE TRANSITION IN NEURAL NETWORKS: MEMORIZATION AND GENERALIZATION UNDER LAZY TRAINING

被引:19
|
作者
Montanari, Andrea [1 ]
Zhong, Yiqiao
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
来源
ANNALS OF STATISTICS | 2022年 / 50卷 / 05期
关键词
Neural tangent kernel; memorization; overfitting; overparametrization; kernel ridge regression; DESCENT; MODELS;
D O I
10.1214/22-AOS2211
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here, we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in d dimensions, and N hidden neurons. We assume that both the sample size n and the dimension d are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the overparametrized regime Nd >> n. This characterization implies as a corollary that the minimum eigenvalue of the empirical NT kernel is bounded away from zero as soon as Nd >> n and, therefore, the network can exactly interpolate arbitrary labels in the same regime. Our second main result is a characterization of the generalization error of NT ridge regression including, as a special case, min-l(2) norm interpolation. We prove that, as soon as Nd >> n, the test error is well approximated by the one of kernel ridge regression with respect to the infinite-width kernel. The latter is in turn well approximated by the error of polynomial ridge regression, whereby the regularization parameter is increased by a "self-induced" term related to the high-degree components of the activation function. The polynomial degree depends on the sample size and the dimension (in particular on log n/ log d).
引用
收藏
页码:2816 / 2847
页数:32
相关论文
共 50 条
  • [41] A novel efficient two-phase algorithm for training interpolation radial basis function networks
    Hoang Xuan Huan
    Dang Thi Thu Hien
    Huu Tue Huynh
    SIGNAL PROCESSING, 2007, 87 (11) : 2708 - 2717
  • [42] An Importance Sampling Method for Generating Optimal Interpolation Points in Training Physics-Informed Neural Networks
    Li, Hui
    Zhang, Yichi
    Wu, Zhaoxiong
    Wang, Zhe
    Wu, Tong
    MATHEMATICS, 2025, 13 (01)
  • [43] Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
    Dziugaite, Gintare Karolina
    Roy, Daniel M.
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [44] A Deep Neural Networks-Based Sound Speed Reconstruction with Enhanced Generalization by Training on a Natural Image Dataset
    Watanabe, Yoshiki
    Azuma, Takashi
    Takagi, Shu
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [45] Percolation phase transition of static and growing networks under a weighted function
    Jia, Xiao
    Hong, Jin-Song
    Gao, Ya-Chun
    Yang, Hong-Chun
    Yang, Chun
    Fu, Chuan-Ji
    Hu, Jian-Quan
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2016, 27 (07):
  • [46] STUDIES ON HUMAN FINGER TAPPING NEURAL NETWORKS BY PHASE-TRANSITION CURVES
    YAMANISHI, JI
    KAWATO, M
    SUZUKI, R
    BIOLOGICAL CYBERNETICS, 1979, 33 (04) : 199 - 208
  • [47] A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks
    Xiaoyu Ma
    Sylvain Sardy
    Nick Hengartner
    Nikolai Bobenko
    Yen Ting Lin
    Statistics and Computing, 2022, 32
  • [48] Improved generalization of NARX neural networks for enhanced metamodeling of nonlinear dynamic systems under stochastic excitations
    Cheng, Ankang
    Low, Ying Min
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2023, 200
  • [49] A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks
    Ma, Xiaoyu
    Sardy, Sylvain
    Hengartner, Nick
    Bobenko, Nikolai
    Lin, Yen Ting
    STATISTICS AND COMPUTING, 2022, 32 (06)
  • [50] Hessian-based mixed-precision quantization with transition aware training for neural networks
    Huang, Zhiyong
    Han, Xiao
    Yu, Zhi
    Zhao, Yunlan
    Hou, Mingyang
    Hu, Shengdong
    NEURAL NETWORKS, 2025, 182