A STUDY OF GENERALIZATION OF STOCHASTIC MIRROR DESCENT ALGORITHMS ON OVERPARAMETERIZED NONLINEAR MODELS

被引:0
|
作者
Azizan, Navid [1 ]
Lale, Sahin [1 ]
Hassibi, Babak [1 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
关键词
Stochastic mirror descent; nonlinear models; convergence; implicit regularization; generalization;
D O I
10.1109/icassp40776.2020.9053864
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overparameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using sufficiently small step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with l(10) norm potential (as a surrogate for l(infinity)) consistently generalizes better than SGD (corresponding to an l(2) norm potential), which in turn consistently outperforms SMD with l(1) norm potential.
引用
收藏
页码:3132 / 3136
页数:5
相关论文
共 50 条
  • [31] DebiNet: Debiasing Linear Models with Nonlinear Overparameterized Neural Networks
    Xu, Shiyun
    Bu, Zhiqi
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [32] Generalization Bounds for Label Noise Stochastic Gradient Descent
    Huh, Jung Eun
    Rebeschini, Patrick
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [33] Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms
    Xu, Pan
    Wang, Tianhao
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [34] A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality
    Azizan, Navid
    Hassibi, Babak
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 3960 - 3965
  • [35] ON THE CONVERGENCE OF MIRROR DESCENT BEYOND STOCHASTIC CONVEX PROGRAMMING
    Zhou, Zhengyuan
    Mertikopoulos, Panayotis
    Bambos, Nicholas
    Boyd, Stephen P.
    Glynn, Peter W.
    SIAM JOURNAL ON OPTIMIZATION, 2020, 30 (01) : 687 - 716
  • [36] Mirror Descent Algorithms for Minimizing Interacting Free Energy
    Ying, Lexing
    JOURNAL OF SCIENTIFIC COMPUTING, 2020, 84 (03)
  • [37] Primal-Dual Stochastic Mirror Descent for MDPs
    Tiapkin, Daniil
    Gasnikov, Alexander
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [38] Validation analysis of mirror descent stochastic approximation method
    Guanghui Lan
    Arkadi Nemirovski
    Alexander Shapiro
    Mathematical Programming, 2012, 134 : 425 - 458
  • [39] Stochastic Mirror Descent in Variationally Coherent Optimization Problems
    Zhou, Zhengyuan
    Mertikopoulos, Panayotis
    Bambos, Nicholas
    Boyd, Stephen
    Glynn, Peter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [40] Variance reduction on general adaptive stochastic mirror descent
    Li, Wenjie
    Wang, Zhanyu
    Zhang, Yichen
    Cheng, Guang
    MACHINE LEARNING, 2022, 111 (12) : 4639 - 4677