THE INTERPOLATION PHASE TRANSITION IN NEURAL NETWORKS: MEMORIZATION AND GENERALIZATION UNDER LAZY TRAINING

被引：19

作者：

Montanari, Andrea ^{[1
]}

Zhong, Yiqiao

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

ANNALS OF STATISTICS | 2022年 / 50卷 / 05期

关键词：

Neural tangent kernel; memorization; overfitting; overparametrization; kernel ridge regression; DESCENT; MODELS;

D O I：

10.1214/22-AOS2211

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here, we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in d dimensions, and N hidden neurons. We assume that both the sample size n and the dimension d are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the overparametrized regime Nd >> n. This characterization implies as a corollary that the minimum eigenvalue of the empirical NT kernel is bounded away from zero as soon as Nd >> n and, therefore, the network can exactly interpolate arbitrary labels in the same regime. Our second main result is a characterization of the generalization error of NT ridge regression including, as a special case, min-l(2) norm interpolation. We prove that, as soon as Nd >> n, the test error is well approximated by the one of kernel ridge regression with respect to the infinite-width kernel. The latter is in turn well approximated by the error of polynomial ridge regression, whereby the regularization parameter is increased by a "self-induced" term related to the high-degree components of the activation function. The polynomial degree depends on the sample size and the dimension (in particular on log n/ log d).

引用

页码：2816 / 2847

页数：32

共 50 条

[31] Seismic velocity model building using neural networks: Training data design and learning generalization
Alzahrani, Hani
Shragge, Jeffrey
GEOPHYSICS, 2022, 87 (02) : R193 - R211
[32] FAT: Training Neural Networks for Reliable Inference Under Hardware Faults
Zahid, Ussama
Gambardella, Giulio
Fraser, Nicholas J.
Blott, Michaela
Vissers, Kees
2020 IEEE INTERNATIONAL TEST CONFERENCE (ITC), 2020,
[33] Multiple Instance Learning for Training Neural Networks under Label Noise
Duffner, Stefan
Garcia, Christophe
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[34] Minimalist neural networks training for phase classification in diluted Ising models
Pavioni, G. L. Garcia
Arlego, M.
Lamas, C. A.
COMPUTATIONAL MATERIALS SCIENCE, 2024, 235
[35] Minimalist neural networks training for phase classification in diluted Ising models
Pavioni, G.L. Garcia
Arlego, M.
Lamas, C.A.
Computational Materials Science, 2024, 235
[36] Complete and representative training of neural networks: A generalization study using double noise injection and natural images
Zhang, Chao
van der Baan, Mirko
GEOPHYSICS, 2021, 86 (03) : V197 - V206
[37] Nonstationary transition to phase synchronization of neural networks induced by the coupling architecture
Budzinski, R. C.
Boaretto, B. R. R.
Rossi, K. L.
Prado, T. L.
Kurths, J.
Lopes, S. R.
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2018, 507 : 321 - 334
[38] Learning topological defects formation with neural networks in a quantum phase transition
Shi, Han-Qing
Zhang, Hai-Qing
COMMUNICATIONS IN THEORETICAL PHYSICS, 2024, 76 (05)
[39] Learning topological defects formation with neural networks in a quantum phase transition
Han-Qing Shi
Hai-Qing Zhang
Communications in Theoretical Physics, 2024, 76 (05) : 70 - 78
[40] Phase transition analysis for shallow neural networks with arbitrary activation functions
Citton, Otavio
Richert, Frederieke
Biehl, Michael
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2025, 660

← 1 2 3 4 5 →