On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

被引:0
|
作者
Su, Lili [1 ]
Yang, Pengkun [2 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate an integral operator which is determined by the feature vector distribution rho only. Consequently, GD method can be viewed as approximately applying the powers of this integral operator on the underlying function f* that generates the responses. We show that if f* admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low-rank approximation error at a linear rate which is determined by f* and rho only, i.e., the rate is independent of the sample size n. Furthermore, if f* has zero low-rank approximation error, then, as long as the width of the neural network is Omega(n log n), the empirical risk decreases to Theta(1/root n). To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where rho is the uniform distribution on the spheres and f* is a polynomial. Throughout this paper, we consider the scenario where the input dimension d is fixed.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Convex Geometry and Duality of Over-parameterized Neural Networks
    Ergen, Tolga
    Pilanci, Mert
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [2] Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks
    Liu, Fanghui
    Dadi, Leello
    Cevher, Volkan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 42
  • [3] Convex geometry and duality of over-parameterized neural networks
    Ergen, Tolga
    Pilanci, Mert
    Journal of Machine Learning Research, 2021, 22
  • [4] Does Preprocessing Help Training Over-parameterized Neural Networks?
    Song, Zhao
    Yang, Shuo
    Zhang, Ruizhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] An Improved Analysis of Training Over-parameterized Deep Neural Networks
    Zou, Difan
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Rethinking Influence Functions of Neural Networks in the Over-Parameterized Regime
    Zhang, Rui
    Zhang, Shihua
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9082 - 9090
  • [7] Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks
    Huang, Ziyi
    Lam, Henry
    Zhang, Haofeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
    Soltanolkotabi, Mahdi
    Javanmard, Adel
    Lee, Jason D.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (02) : 742 - 769
  • [9] Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks
    Yang, Yunfei
    Zhou, Ding-Xuan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 35
  • [10] On the convergence analysis of over-parameterized variational autoencoders: a neural tangent kernel perspective
    Li Wang
    Wei Huang
    Machine Learning, 2025, 114 (1)