Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

被引：0

作者：

Rotskoff, Grant M. ^{[1
]}

Vanden-Eijnden, Eric ^{[1
]}

机构：

[1] NYU, Courant Inst Math Sci, New York, NY 10003 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters n is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as O(n(-1)). In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale O(n(-1)). These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.

引用

页数：10

共 50 条

[1] Asymptotic Prediction Error Variance for Feedforward Neural Networks
Malmstrom, Magnus
Skog, Isaac
Axehill, Daniel
Gustafsson, Fredrik
IFAC PAPERSONLINE, 2020, 53 (02): : 1108 - 1113
[2] Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks
Senen-Cerda, Albert
Sanders, Jaron
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (02)
[3] GLOBAL CONVERGENCE AND ASYMPTOTIC STABILITY OF ASYMMETRIC HOPFIELD NEURAL NETWORKS
XU, ZB
KWONG, CP
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1995, 191 (03) : 405 - 427
[4] Global asymptotic stability of neural networks with uncertain parameters and time-varying delay
Li, Yang
Zhang, Jianhua
Wu, Xueli
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2018, 57 (03) : 228 - 236
[5] CONVERGENCE TIME ON THE RS MODEL FOR NEURAL NETWORKS
Penna, T. J. P.
de Oliveira, P. M. C.
Arenzon, J. J.
de Almeida, R. M. C.
Iglesias, J. R.
INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 1991, 2 (03): : 711 - 717
[6] THE CHARACTERISTICS OF THE CONVERGENCE TIME OF ASSOCIATIVE NEURAL NETWORKS
TANAKA, T
YAMADA, M
NEURAL COMPUTATION, 1993, 5 (03) : 463 - 472
[7] Characteristics of the convergence time of associative neural networks
Tanaka, Toshiaki
Yamada, Miki
Neural Computation, 1993, 5 (03)
[8] ON THE CONVERGENCE OF DISCRETE-TIME NEURAL NETWORKS
HARRER, H
GALIAS, Z
NOSSEK, JA
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 1993, 21 (02) : 191 - 195
[9] Dynamical neural networks that ensure exponential identification error convergence
Kosmatopoulos, EB
Christodoulou, MA
Ioannou, PA
NEURAL NETWORKS, 1997, 10 (02) : 299 - 314
[10] Asymptotic Convergence of Soft-Constrained Neural Networks for Density Estimation
Trentin, Edmondo
MATHEMATICS, 2020, 8 (04)

← 1 2 3 4 5 →