Large deviations of one-hidden-layer neural networks

被引:0
|
作者
Hirsch, Christian [1 ]
Willhalm, Daniel [2 ,3 ]
机构
[1] Aarhus Univ, Dept Math, Ny Munkegade 118, DK-8000 Aarhus C, Denmark
[2] Univ Groningen, Bernoulli Inst, Nijenborgh 9, NL-9747 AG Groningen, Netherlands
[3] Toronto Metropolitan Univ, Dept Math, 350 Victoria St, Toronto, ON M5B 2K3, Canada
关键词
Artificial neural networks; large deviations; stochastic gradient descent; interacting particle systems; weak convergence;
D O I
10.1142/S0219493725500029
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles.
引用
收藏
页数:53
相关论文
共 50 条
  • [1] The Sample Complexity of One-Hidden-Layer Neural Networks
    Vardi, Gal
    Shamir, Ohad
    Srebro, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] Recovery Guarantees for One-hidden-layer Neural Networks
    Zhong, Kai
    Song, Zhao
    Jain, Prateek
    Bartlett, Peter L.
    Dhillon, Inderjit S.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [3] Analysis of one-hidden-layer Neural Networks via the Resolvent Method
    Piccolo, Vanessa
    Schroder, Dominik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Distributed Parameter Estimation in Randomized One-hidden-layer Neural Networks
    Wang, Yinsong
    Shahrampour, Shahin
    2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 737 - 742
  • [5] Learning One-hidden-layer Neural Networks under General Input Distributions
    Gao, Weihao
    Makkuva, Ashok Vardhan
    Oh, Sewoong
    Viswanath, Pramod
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [6] Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy
    Fu, Haoyu
    Chi, Yuejie
    Liang, Yingbin
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 3225 - 3235
  • [7] Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks
    Cao, Yuan
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] Incremental approximation by one-hidden-layer neural networks: Discrete functions rapprochement
    Beliczynski, B
    ISIE'96 - PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1 AND 2, 1996, : 392 - 397
  • [9] On the landscape of one-hidden-layer sparse networks and beyond
    Lin, Dachao
    Sun, Ruoyu
    Zhang, Zhihua
    ARTIFICIAL INTELLIGENCE, 2022, 309
  • [10] Learning Narrow One-Hidden-Layer ReLU Networks
    Chen, Sitan
    Dou, Zehao
    Goel, Surbhi
    Klivans, Adam
    Meka, Raghu
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195