Best k-Layer Neural Network Approximations

被引：1

作者：

Lim, Lek-Heng ^{[1
]}

Michalek, Mateusz ^{[2
,3
]}

Qi, Yang ^{[4
]}

机构：

[1] Univ Chicago, Dept Stat, Chicago, IL 60637 USA

[2] Max Planck Inst Math Sci, D-04103 Leipzig, Germany

[3] Univ Konstanz, D-78457 Constance, Germany

[4] Ecole Polytech, INRIA Saclay Ile France, CMAP, IP Paris,CNRS, F-91128 Palaiseau, France

来源：

CONSTRUCTIVE APPROXIMATION | 2022年 / 55卷 / 01期

关键词：

Neural network; Best approximation; Join loci; Secant loci;

D O I：

10.1007/s00365-021-09545-2

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set s(1), ..., s(n) is an element of R-p with corresponding responses t(1), ..., t(n) is an element of R-q, fitting a k-layer neural network v(theta) : R-p -> R-q involves estimation of the weights theta is an element of R-m via an ERM: inf(theta is an element of Rm)Sigma(n)(i=1)parallel to t(i) - v(theta)(s(i))parallel to(2)(2). We show that even for k = 2, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. In addition, we deduce that if one attempts to minimize such a loss function in the event when its infimum is not attainable, it necessarily results in values of theta diverging to +/-infinity. We will show that for smooth activations sigma(x) = 1/(1 + exp(-x)) and sigma(x) = tanh(x), such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation sigma(x) = max(0, x), we completely classify cases where the ERM for a best two-layer neural network approximation attains its infimum. In recent applications of neural networks, where overfitting is commonplace, the failure to attain an infimum is avoided by ensuring that the system of equations t(i) = v(theta)(s(i)), i = 1, ..., n, has a solution. For a two-layer ReLU-activated network, we will show when such a system of equations has a solution generically, i.e., when can such a neural network be fitted perfectly with probability one.

引用

页码：583 / 604

页数：22

共 50 条

[21] NeuroBE: Escalating Neural Network Approximations of Bucket Elimination
Agarwal, Sakshi
Kask, Kalev
Ihler, Alexander
Dechter, Rina
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 11 - 21
[22] Neural network approximations for Calabi-Yau metrics
Vishnu Jejjala
Damián Kaloni Mayorga Peña
Challenger Mishra
Journal of High Energy Physics, 2022
[23] Latitudinal and longitudinal neural network structures for function approximations
Chen, DG
Mohler, RR
ICECS 96 - PROCEEDINGS OF THE THIRD IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS, VOLS 1 AND 2, 1996, : 283 - 286
[24] EXISTENCE AND UNIQUENESS RESULTS FOR NEURAL-NETWORK APPROXIMATIONS
WILLIAMSON, RC
HELMKE, U
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01): : 2 - 13
[25] Video compression with wavelets and random neural network approximations
Hai, F
Hussain, K
Gelenbe, E
Guha, R
APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN IMAGE PROCESSING VI, 2001, 4305 : 57 - 64
[26] Investigation on practical possibility of function approximations by neural network
Yang, Xiaohui
Chen, Dingguo
System and Control: Theory and Applications, 2000, : 293 - 298
[27] Symmetric table addition methods for neural network approximations
Koc-Sahan, N
Schlessman, JA
Schulte, MJ
ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS XI, 2001, 4474 : 126 - 133
[28] Comparing neural network approximations for different functional forms
Morgan, P
Curry, B
Beynon, M
EXPERT SYSTEMS, 1999, 16 (02) : 60 - 71
[29] A NOTE ON BEST AND BEST SIMULTANEOUS APPROXIMATIONS
MUTHUKUMAR, S
INDIAN JOURNAL OF PURE & APPLIED MATHEMATICS, 1980, 11 (06): : 715 - 719
[30] Best polynomial approximations
Lossers, OP
AMERICAN MATHEMATICAL MONTHLY, 2003, 110 (06): : 544 - 544

← 1 2 3 4 5 →