Best k-Layer Neural Network Approximations

被引:1
|
作者
Lim, Lek-Heng [1 ]
Michalek, Mateusz [2 ,3 ]
Qi, Yang [4 ]
机构
[1] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
[2] Max Planck Inst Math Sci, D-04103 Leipzig, Germany
[3] Univ Konstanz, D-78457 Constance, Germany
[4] Ecole Polytech, INRIA Saclay Ile France, CMAP, IP Paris,CNRS, F-91128 Palaiseau, France
关键词
Neural network; Best approximation; Join loci; Secant loci;
D O I
10.1007/s00365-021-09545-2
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set s(1), ..., s(n) is an element of R-p with corresponding responses t(1), ..., t(n) is an element of R-q, fitting a k-layer neural network v(theta) : R-p -> R-q involves estimation of the weights theta is an element of R-m via an ERM: inf(theta is an element of Rm)Sigma(n)(i=1)parallel to t(i) - v(theta)(s(i))parallel to(2)(2). We show that even for k = 2, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. In addition, we deduce that if one attempts to minimize such a loss function in the event when its infimum is not attainable, it necessarily results in values of theta diverging to +/-infinity. We will show that for smooth activations sigma(x) = 1/(1 + exp(-x)) and sigma(x) = tanh(x), such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation sigma(x) = max(0, x), we completely classify cases where the ERM for a best two-layer neural network approximation attains its infimum. In recent applications of neural networks, where overfitting is commonplace, the failure to attain an infimum is avoided by ensuring that the system of equations t(i) = v(theta)(s(i)), i = 1, ..., n, has a solution. For a two-layer ReLU-activated network, we will show when such a system of equations has a solution generically, i.e., when can such a neural network be fitted perfectly with probability one.
引用
收藏
页码:583 / 604
页数:22
相关论文
共 50 条
  • [11] Approximations by multivariate perturbed neural network operators
    Anastassiou, George A.
    ANALYSIS AND APPLICATIONS, 2017, 15 (03) : 413 - 432
  • [12] On the K best integer network flows
    Sedeno-Noda, Antonio
    Jose Espino-Martin, Juan
    COMPUTERS & OPERATIONS RESEARCH, 2013, 40 (02) : 616 - 626
  • [13] ON FINDING THE K BEST CUTS IN A NETWORK
    HAMACHER, HW
    PICARD, JC
    QUEYRANNE, M
    OPERATIONS RESEARCH LETTERS, 1984, 2 (06) : 303 - 305
  • [14] THE K BEST SPANNING ARBORESCENCES OF A NETWORK
    CAMERINI, PM
    FRATTA, L
    MAFFIOLI, F
    NETWORKS, 1980, 10 (02) : 91 - 109
  • [15] TESTING THE K-LAYER ROUTABILITY IN A CIRCULAR CHANNEL - CASE IN WHICH NO NETS HAVE 2 TERMINALS ON THE SAME CIRCLE
    KOBAYASHI, N
    KASHIWABARA, T
    MASUDA, S
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1992, E75A (02) : 233 - 239
  • [16] Best rank-k approximations for tensors: generalizing Eckart–Young
    Jan Draisma
    Giorgio Ottaviani
    Alicia Tocino
    Research in the Mathematical Sciences, 2018, 5
  • [17] Adaptive friction compensation using neural network approximations
    Huang, SN
    Tan, KK
    Lee, TH
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2000, 30 (04): : 551 - 557
  • [18] Neural network approximations for Calabi-Yau metrics
    Jejjala, Vishnu
    Pena, Damian Kaloni Mayorga
    Mishra, Challenger
    JOURNAL OF HIGH ENERGY PHYSICS, 2022, 2022 (08)
  • [19] Variational neural and tensor network approximations of thermal states
    Lu, Sirui
    Giudice, Giacomo
    Cirac, J. Ignacio
    PHYSICAL REVIEW B, 2025, 111 (07)
  • [20] Adaptive motion control using neural network approximations
    Huang, SN
    Tan, KK
    Lee, TH
    AUTOMATICA, 2002, 38 (02) : 227 - 233