Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks

被引：1

作者：

Senen-Cerda, Albert ^{[1
]}

Sanders, Jaron ^{[1
]}

机构：

[1] Eindhoven Univ Technol, Eindhoven, Netherlands

来源：

PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS | 2022年 / 6卷 / 02期

关键词：

Dropout; neural networks; convergence rate; gradient flow;

D O I：

10.1145/3530898

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs)-which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use {0, 1}-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.

引用

页数：53

共 50 条

[41] On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks
Luo, Zhi-Quan
NEURAL COMPUTATION, 1991, 3 (02) : 226 - 245
[42] Checkerboard Dropout: A Structured Dropout With Checkerboard Pattern for Convolutional Neural Networks
Nguyen, Khanh-Binh
Choi, Jaehyuk
Yang, Joon-Sung
IEEE ACCESS, 2022, 10 : 76044 - 76054
[43] Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks
Rotskoff, Grant M.
Vanden-Eijnden, Eric
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[44] On the domain of attraction and convergence rate of Hopfield continuous feedback neural networks
Cao, JD
PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 3277 - 3280
[45] A Novel Learning Rate Schedule in Optimization for Neural Networks and It's Convergence
Park, Jieun
Yi, Dokkyun
Ji, Sangmin
SYMMETRY-BASEL, 2020, 12 (04):
[46] Systemical convergence rate analysis of convex incremental feedforward neural networks
Chen, Lei
Huang, Guang-Bin
Pung, Hung Keng
NEUROCOMPUTING, 2009, 72 (10-12) : 2627 - 2635
[47] Towards dropout training for convolutional neural networks
Wu, Haibing
Gu, Xiaodong
NEURAL NETWORKS, 2015, 71 : 1 - 10
[48] Variational Dropout Sparsifies Deep Neural Networks
Molchanov, Dmitry
Ashukha, Arsenii
Vetrov, Dmitry
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[49] Augmenting Recurrent Neural Networks Resilience by Dropout
Bacciu, Davide
Crecchi, Francesco
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (01) : 345 - 351
[50] Dropout Rademacher complexity of deep neural networks
Wei GAO
Zhi-Hua ZHOU
Science China(Information Sciences), 2016, 59 (07) : 173 - 184

← 1 2 3 4 5 →