On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

被引:0
|
作者
Su, Lili [1 ]
Yang, Pengkun [2 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate an integral operator which is determined by the feature vector distribution rho only. Consequently, GD method can be viewed as approximately applying the powers of this integral operator on the underlying function f* that generates the responses. We show that if f* admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low-rank approximation error at a linear rate which is determined by f* and rho only, i.e., the rate is independent of the sample size n. Furthermore, if f* has zero low-rank approximation error, then, as long as the width of the neural network is Omega(n log n), the empirical risk decreases to Theta(1/root n). To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where rho is the uniform distribution on the spheres and f* is a polynomial. Throughout this paper, we consider the scenario where the input dimension d is fixed.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective
    Wu, Lei
    Ma, Chao
    Weinan, E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [22] Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
    da Cunha, Arthur
    d'Amore, Francesco
    Natale, Emanuele
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Gradient descent optimizes over-parameterized deep ReLU networks
    Difan Zou
    Yuan Cao
    Dongruo Zhou
    Quanquan Gu
    Machine Learning, 2020, 109 : 467 - 492
  • [24] Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks
    Hacohen, Guy
    Weinshall, Daphna
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [25] Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks
    Hacohen, Guy
    Weinshall, Daphna
    Journal of Machine Learning Research, 2022, 23
  • [26] Gradient descent optimizes over-parameterized deep ReLU networks
    Zou, Difan
    Cao, Yuan
    Zhou, Dongruo
    Gu, Quanquan
    MACHINE LEARNING, 2020, 109 (03) : 467 - 492
  • [27] Rethinking Gauss-Newton for learning over-parameterized models
    Arbel, Michael
    Menegaux, Romain
    Wolinski, Pierre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Orthogonal Over-Parameterized Training
    Liu, Weiyang
    Lin, Rongmei
    Liu, Zhen
    Rehg, James M.
    Paull, Liam
    Xiong, Li
    Song, Le
    Weller, Adrian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7247 - 7256
  • [29] Exploiting Sparsity in Over-parameterized Federated Learning over Multiple Access Channels
    Kaur, Gagandeep
    Prasad, Ranjitha
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 605 - 606
  • [30] Over-parameterized variational optical flow
    Nir, Tal
    Bruckstein, Alfred M.
    Kimmel, Ron
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 76 (02) : 205 - 216