On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

被引:0
|
作者
Su, Lili [1 ]
Yang, Pengkun [2 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate an integral operator which is determined by the feature vector distribution rho only. Consequently, GD method can be viewed as approximately applying the powers of this integral operator on the underlying function f* that generates the responses. We show that if f* admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low-rank approximation error at a linear rate which is determined by f* and rho only, i.e., the rate is independent of the sample size n. Furthermore, if f* has zero low-rank approximation error, then, as long as the width of the neural network is Omega(n log n), the empirical risk decreases to Theta(1/root n). To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where rho is the uniform distribution on the spheres and f* is a polynomial. Throughout this paper, we consider the scenario where the input dimension d is fixed.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Over-Parameterized Variational Optical Flow
    Tal Nir
    Alfred M. Bruckstein
    Ron Kimmel
    International Journal of Computer Vision, 2008, 76 : 205 - 216
  • [32] OPS-NET: OVER-PARAMETERIZED SHARING NETWORKS FOR VIDEO FRAME INTERPOLATION
    Wang, Zhen-Fang
    Wang, Yan-Jiang
    Shao, Shuai
    Liu, Bao-Di
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1974 - 1978
  • [33] OVER-PARAMETERIZED NETWORK SOLVES PHASE RETRIEVAL EFFECTIVELY
    Li, Ji
    Wang, Chao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4133 - 4137
  • [34] On over-parameterized model based TV-denoising
    Nir, Tal
    Bruckstem, Alfred M.
    ISSCS 2007: INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS, VOLS 1 AND 2, 2007, : 279 - +
  • [35] Sparse optimization on measures with over-parameterized gradient descent
    Lénaïc Chizat
    Mathematical Programming, 2022, 194 : 487 - 532
  • [36] Implicit Regularization in Over-Parameterized Support Vector Machine
    Sui, Yang
    He, Xin
    Bai, Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Over-Parameterized Optical Flow Using a Stereoscopic Constraint
    Rosman, Guy
    Shem-Tov, Shachar
    Bitton, David
    Nir, Tal
    Adiv, Gilad
    Kimmel, Ron
    Feuer, Arie
    Bruckstein, Alfred M.
    SCALE SPACE AND VARIATIONAL METHODS IN COMPUTER VISION, 2012, 6667 : 761 - +
  • [38] On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing
    Zhuo, Jiacheng
    Kwon, Jeongyeol
    Ho, Nhat
    Caramanis, Constantine
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 47
  • [39] Global Convergence of Over-parameterized Deep Equilibrium Models
    Ling, Zenan
    Xie, Xingyu
    Wang, Qiuhao
    Zhang, Zongpeng
    Lin, Zhouchen
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206 : 767 - 787
  • [40] Sparse optimization on measures with over-parameterized gradient descent
    Chizat, Lenaic
    MATHEMATICAL PROGRAMMING, 2022, 194 (1-2) : 487 - 532