On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

被引：0

作者：

Su, Lili ^{[1
]}

Yang, Pengkun ^{[2
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate an integral operator which is determined by the feature vector distribution rho only. Consequently, GD method can be viewed as approximately applying the powers of this integral operator on the underlying function f* that generates the responses. We show that if f* admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low-rank approximation error at a linear rate which is determined by f* and rho only, i.e., the rate is independent of the sample size n. Furthermore, if f* has zero low-rank approximation error, then, as long as the width of the neural network is Omega(n log n), the empirical risk decreases to Theta(1/root n). To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where rho is the uniform distribution on the spheres and f* is a polynomial. Throughout this paper, we consider the scenario where the input dimension d is fixed.

引用

页数：10

共 50 条

[41] Further remarks on constrained over-parameterized linear models
Nesrin Güler
Melek Eriş Büyükkaya
Statistical Papers, 2024, 65 : 975 - 988
[42] Over-parameterized Deep Nonparametric Regression for Dependent Data with Its Applications to Reinforcement Learning
Feng, Xingdong
Jiao, Yuling
Kang, Lican
Zhang, Baqun
Zhou, Fan
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[43] A Functional Perspective on Learning Symmetric Functions with Neural Networks
Zweig, Aaron
Bruna, Joan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[44] Further remarks on constrained over-parameterized linear models
Guler, Nesrin
Buyukkaya, Melek Eris
STATISTICAL PAPERS, 2024, 65 (02) : 975 - 988
[45] DO-Conv: Depthwise Over-Parameterized Convolutional Layer
Cao, Jinming
Li, Yangyan
Sun, Mingchao
Chen, Ying
Lischinski, Dani
Cohen-Or, Daniel
Chen, Baoquan
Tu, Changhe
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3726 - 3736
[46] Convergence beyond the over-parameterized regime using Rayleigh quotients
Robin, David A. R.
Scaman, Kevin
Lelarge, Marc
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[47] Understanding Implicit Regularization in Over-Parameterized Single Index Model
Fan, Jianqing
Yang, Zhuoran
Yu, Mengxin
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2315 - 2328
[48] Convergence beyond the over-parameterized regime using Rayleigh quotients
Robin, David A.R.
Scaman, Kevin
Lelarge, Marc
Advances in Neural Information Processing Systems, 2022, 35
[49] OVER-PARAMETERIZED MODEL OPTIMIZATION WITH POLYAK-LOJASIEWICZ CONDITION
Chen, Yixuan
Shi, Yubin
Dong, Mingzhi
Yang, Xiaochen
Li, Dongsheng
Wang, Yujiang
Dick, Robert P.
Lv, Qin
Zhao, Yingying
Yang, Fan
Gu, Ning
Shang, Li
11th International Conference on Learning Representations, ICLR 2023, 2023,
[50] Equivalence of predictors under real and over-parameterized linear models
Gan, Shengjun
Sun, Yuqin
Tian, Yongge
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (11) : 5368 - 5383

← 1 2 3 4 5 →