Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

被引:0
|
作者
Rotskoff, Grant M. [1 ]
Vanden-Eijnden, Eric [1 ]
机构
[1] NYU, Courant Inst Math Sci, New York, NY 10003 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters n is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as O(n(-1)). In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale O(n(-1)). These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] CONDITIONS OF ASYMPTOTIC STABILITY FOR CELLULAR NEURAL NETWORKS WITH TIME DELAY
    Wu Zhongfu Liao Xiaofeng Yu Juebang(Institute of Computer
    Journal of Electronics(China), 2000, (04) : 345 - 351
  • [22] On global asymptotic stability of a class of neural networks with time delays
    Shao, Jin-Liang
    Huang, Ting-Zhu
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4120 - +
  • [23] Use of neural networks for fitting of FE probabilistic scaling model parameters
    E.M.R. Fairbairn
    C.N.M. Paz
    N.F.F. Ebecken
    F.-J. Ulm
    International Journal of Fracture, 1999, 95 : 315 - 324
  • [24] Use of neural networks for fitting of FE probabilistic scaling model parameters
    Fairbairn, EMR
    Paz, CNM
    Ebecken, NFF
    Ulm, FJ
    INTERNATIONAL JOURNAL OF FRACTURE, 1999, 95 (1-4) : 315 - 324
  • [25] Sufficient conditions for error back flow convergence in dynamical recurrent neural networks
    Aussem, A
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL IV, 2000, : 577 - 582
  • [26] Convergence in discrete-time neural networks with specific performance
    Chu, TG
    PHYSICAL REVIEW E, 2001, 63 (05):
  • [27] Study on the convergence of discrete-time cellular neural networks
    Ma, Run-Nian
    Zhang, Qiang
    Xu, Jin
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2002, 24 (01):
  • [28] Convergence time analysis of Asynchronous Distributed Artificial Neural Networks
    Tosi, Mauro D. L.
    Venugopal, Vinu Ellampallil
    Theobald, Martin
    PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 314 - 315
  • [29] Robustness of convergence in finite time for linear programming neural networks
    Di Marco, M
    Forti, M
    Grazzini, M
    INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2006, 34 (03) : 307 - 316
  • [30] ON THE CONVERGENCE OF RECIPROCAL DISCRETE-TIME CELLULAR NEURAL NETWORKS
    PERFETTI, R
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 1993, 40 (04): : 286 - 287