On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

被引:0
|
作者
Su, Lili [1 ]
Yang, Pengkun [2 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate an integral operator which is determined by the feature vector distribution rho only. Consequently, GD method can be viewed as approximately applying the powers of this integral operator on the underlying function f* that generates the responses. We show that if f* admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low-rank approximation error at a linear rate which is determined by f* and rho only, i.e., the rate is independent of the sample size n. Furthermore, if f* has zero low-rank approximation error, then, as long as the width of the neural network is Omega(n log n), the empirical risk decreases to Theta(1/root n). To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where rho is the uniform distribution on the spheres and f* is a polynomial. Throughout this paper, we consider the scenario where the input dimension d is fixed.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Further remarks on constrained over-parameterized linear models
    Nesrin Güler
    Melek Eriş Büyükkaya
    Statistical Papers, 2024, 65 : 975 - 988
  • [42] Over-parameterized Deep Nonparametric Regression for Dependent Data with Its Applications to Reinforcement Learning
    Feng, Xingdong
    Jiao, Yuling
    Kang, Lican
    Zhang, Baqun
    Zhou, Fan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [43] A Functional Perspective on Learning Symmetric Functions with Neural Networks
    Zweig, Aaron
    Bruna, Joan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [44] Further remarks on constrained over-parameterized linear models
    Guler, Nesrin
    Buyukkaya, Melek Eris
    STATISTICAL PAPERS, 2024, 65 (02) : 975 - 988
  • [45] DO-Conv: Depthwise Over-Parameterized Convolutional Layer
    Cao, Jinming
    Li, Yangyan
    Sun, Mingchao
    Chen, Ying
    Lischinski, Dani
    Cohen-Or, Daniel
    Chen, Baoquan
    Tu, Changhe
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3726 - 3736
  • [46] Convergence beyond the over-parameterized regime using Rayleigh quotients
    Robin, David A. R.
    Scaman, Kevin
    Lelarge, Marc
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [47] Understanding Implicit Regularization in Over-Parameterized Single Index Model
    Fan, Jianqing
    Yang, Zhuoran
    Yu, Mengxin
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2315 - 2328
  • [48] Convergence beyond the over-parameterized regime using Rayleigh quotients
    Robin, David A.R.
    Scaman, Kevin
    Lelarge, Marc
    Advances in Neural Information Processing Systems, 2022, 35
  • [49] OVER-PARAMETERIZED MODEL OPTIMIZATION WITH POLYAK-LOJASIEWICZ CONDITION
    Chen, Yixuan
    Shi, Yubin
    Dong, Mingzhi
    Yang, Xiaochen
    Li, Dongsheng
    Wang, Yujiang
    Dick, Robert P.
    Lv, Qin
    Zhao, Yingying
    Yang, Fan
    Gu, Ning
    Shang, Li
    11th International Conference on Learning Representations, ICLR 2023, 2023,
  • [50] Equivalence of predictors under real and over-parameterized linear models
    Gan, Shengjun
    Sun, Yuqin
    Tian, Yongge
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (11) : 5368 - 5383