THE INTERPOLATION PHASE TRANSITION IN NEURAL NETWORKS: MEMORIZATION AND GENERALIZATION UNDER LAZY TRAINING

被引:19
|
作者
Montanari, Andrea [1 ]
Zhong, Yiqiao
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
来源
ANNALS OF STATISTICS | 2022年 / 50卷 / 05期
关键词
Neural tangent kernel; memorization; overfitting; overparametrization; kernel ridge regression; DESCENT; MODELS;
D O I
10.1214/22-AOS2211
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here, we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in d dimensions, and N hidden neurons. We assume that both the sample size n and the dimension d are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the overparametrized regime Nd >> n. This characterization implies as a corollary that the minimum eigenvalue of the empirical NT kernel is bounded away from zero as soon as Nd >> n and, therefore, the network can exactly interpolate arbitrary labels in the same regime. Our second main result is a characterization of the generalization error of NT ridge regression including, as a special case, min-l(2) norm interpolation. We prove that, as soon as Nd >> n, the test error is well approximated by the one of kernel ridge regression with respect to the infinite-width kernel. The latter is in turn well approximated by the error of polynomial ridge regression, whereby the regularization parameter is increased by a "self-induced" term related to the high-degree components of the activation function. The polynomial degree depends on the sample size and the dimension (in particular on log n/ log d).
引用
收藏
页码:2816 / 2847
页数:32
相关论文
共 50 条
  • [21] Detection of Phase Transition via Convolutional Neural Networks
    Tanaka, Akinori
    Tomiya, Akio
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2017, 86 (06)
  • [22] Generalization-Based Acquisition of Training Data for Motor Primitive Learning by Neural Networks
    Loncarevic, Zvezdan
    Pahic, Rok
    Ude, Ales
    Gams, Andrej
    APPLIED SCIENCES-BASEL, 2021, 11 (03): : 1 - 17
  • [23] Achieving Robust Generalization for Wireless Channel Estimation Neural Networks by Designed Training Data
    Luan, Dianxin
    Thompson, John
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 3462 - 3467
  • [24] Adaptive training of neural networks for automatic seismic phase identification
    Wang, J
    PURE AND APPLIED GEOPHYSICS, 2002, 159 (05) : 1021 - 1041
  • [25] Adaptive Training of Neural Networks for Automatic Seismic Phase Identification
    J. Wang
    pure and applied geophysics, 2002, 159 : 1021 - 1041
  • [26] Parameter diagnostics of phases and phase transition learning by neural networks
    Suchsland, Philippe
    Wessel, Stefan
    PHYSICAL REVIEW B, 2018, 97 (17)
  • [27] A new method to improve the generalization ability of neural networks: A case study of nuclear mass training
    Zhao, Tianliang
    Zhang, Hongfei
    NUCLEAR PHYSICS A, 2022, 1021
  • [28] Seismic Velocity Model Building Using Neural Networks: Training Data Design and Learning Generalization
    Alzahrani H.
    Shragge J.
    Geophysics, 2021, 87 (02)
  • [29] Train longer, generalize better: closing the generalization gap in large batch training of neural networks
    Hoffer, Elad
    Hubara, Itay
    Soudry, Daniel
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [30] REIN: A Robust Training Method for Enhancing Generalization Ability of Neural Networks in Autonomous Driving Systems
    Yu, Fuxun
    Liu, Chenchen
    Chen, Xian
    24TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2019), 2019, : 456 - 461