Large deviations of one-hidden-layer neural networks

被引：0

作者：

Hirsch, Christian ^{[1
]}

Willhalm, Daniel ^{[2
,3
]}

机构：

[1] Aarhus Univ, Dept Math, Ny Munkegade 118, DK-8000 Aarhus C, Denmark

[2] Univ Groningen, Bernoulli Inst, Nijenborgh 9, NL-9747 AG Groningen, Netherlands

[3] Toronto Metropolitan Univ, Dept Math, 350 Victoria St, Toronto, ON M5B 2K3, Canada

来源：

STOCHASTICS AND DYNAMICS | 2024年 / 24卷 / 08期

关键词：

Artificial neural networks; large deviations; stochastic gradient descent; interacting particle systems; weak convergence;

D O I：

10.1142/S0219493725500029

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles.

引用

页数：53

共 50 条

[1] The Sample Complexity of One-Hidden-Layer Neural Networks
Vardi, Gal
Shamir, Ohad
Srebro, Nathan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[2] Recovery Guarantees for One-hidden-layer Neural Networks
Zhong, Kai
Song, Zhao
Jain, Prateek
Bartlett, Peter L.
Dhillon, Inderjit S.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[3] Analysis of one-hidden-layer Neural Networks via the Resolvent Method
Piccolo, Vanessa
Schroder, Dominik
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Distributed Parameter Estimation in Randomized One-hidden-layer Neural Networks
Wang, Yinsong
Shahrampour, Shahin
2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 737 - 742
[5] Learning One-hidden-layer Neural Networks under General Input Distributions
Gao, Weihao
Makkuva, Ashok Vardhan
Oh, Sewoong
Viswanath, Pramod
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[6] Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy
Fu, Haoyu
Chi, Yuejie
Liang, Yingbin
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 3225 - 3235
[7] Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks
Cao, Yuan
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[8] Incremental approximation by one-hidden-layer neural networks: Discrete functions rapprochement
Beliczynski, B
ISIE'96 - PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1 AND 2, 1996, : 392 - 397
[9] On the landscape of one-hidden-layer sparse networks and beyond
Lin, Dachao
Sun, Ruoyu
Zhang, Zhihua
ARTIFICIAL INTELLIGENCE, 2022, 309
[10] Learning Narrow One-Hidden-Layer ReLU Networks
Chen, Sitan
Dou, Zehao
Goel, Surbhi
Klivans, Adam
Meka, Raghu
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195

← 1 2 3 4 5 →