A STUDY OF GENERALIZATION OF STOCHASTIC MIRROR DESCENT ALGORITHMS ON OVERPARAMETERIZED NONLINEAR MODELS

被引:0
|
作者
Azizan, Navid [1 ]
Lale, Sahin [1 ]
Hassibi, Babak [1 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
关键词
Stochastic mirror descent; nonlinear models; convergence; implicit regularization; generalization;
D O I
10.1109/icassp40776.2020.9053864
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overparameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using sufficiently small step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with l(10) norm potential (as a surrogate for l(infinity)) consistently generalizes better than SGD (corresponding to an l(2) norm potential), which in turn consistently outperforms SMD with l(1) norm potential.
引用
收藏
页码:3132 / 3136
页数:5
相关论文
共 50 条
  • [21] Policy Optimization with Stochastic Mirror Descent
    Yang, Long
    Zhang, Yu
    Zheng, Gang
    Zheng, Qian
    Li, Pengfei
    Huang, Jianghang
    Pan, Gang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8823 - 8831
  • [22] The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
    Park, Daniel S.
    Sohl-Dickstein, Jascha
    Le, Quoc, V
    Smith, Samuel L.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [23] Stability and Generalization of Decentralized Stochastic Gradient Descent
    Sun, Tao
    Li, Dongsheng
    Wang, Bao
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9756 - 9764
  • [24] A Generalization Gap Estimation for Overparameterized Models via the Langevin Functional Variance
    Okuno, Akifumi
    Yano, Keisuke
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (04) : 1287 - 1295
  • [25] Fastest rates for stochastic mirror descent methods
    Hanzely, Filip
    Richtarik, Peter
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2021, 79 (03) : 717 - 766
  • [26] Fastest rates for stochastic mirror descent methods
    Filip Hanzely
    Peter Richtárik
    Computational Optimization and Applications, 2021, 79 : 717 - 766
  • [27] STOCHASTIC BLOCK MIRROR DESCENT METHODS FOR NONSMOOTH AND STOCHASTIC OPTIMIZATION
    Dang, Cong D.
    Lan, Guanghui
    SIAM JOURNAL ON OPTIMIZATION, 2015, 25 (02) : 856 - 881
  • [28] Adaptive Stochastic Mirror Descent for Constrained Optimization
    Bayandina, Anastasia
    2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 40 - 43
  • [29] Efficiently Solving MDPs with Stochastic Mirror Descent
    Jin, Yujia
    Sidford, Aaron
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [30] Stochastic algorithms with descent guarantees for ICA
    Ablin, Pierre
    Gramfort, Alexandre
    Cardoso, Jean-Francois
    Bach, Francis
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89