A STUDY OF GENERALIZATION OF STOCHASTIC MIRROR DESCENT ALGORITHMS ON OVERPARAMETERIZED NONLINEAR MODELS

被引:0
|
作者
Azizan, Navid [1 ]
Lale, Sahin [1 ]
Hassibi, Babak [1 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
关键词
Stochastic mirror descent; nonlinear models; convergence; implicit regularization; generalization;
D O I
10.1109/icassp40776.2020.9053864
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overparameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using sufficiently small step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with l(10) norm potential (as a surrogate for l(infinity)) consistently generalizes better than SGD (corresponding to an l(2) norm potential), which in turn consistently outperforms SMD with l(1) norm potential.
引用
收藏
页码:3132 / 3136
页数:5
相关论文
共 50 条
  • [41] Validation analysis of mirror descent stochastic approximation method
    Lan, Guanghui
    Nemirovski, Arkadi
    Shapiro, Alexander
    MATHEMATICAL PROGRAMMING, 2012, 134 (02) : 425 - 458
  • [42] Mirror Descent Algorithms for Minimizing Interacting Free Energy
    Lexing Ying
    Journal of Scientific Computing, 2020, 84
  • [43] Variance reduction on general adaptive stochastic mirror descent
    Wenjie Li
    Zhanyu Wang
    Yichen Zhang
    Guang Cheng
    Machine Learning, 2022, 111 : 4639 - 4677
  • [44] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [45] Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
    Li, Zhiyuan
    Wang, Tianhao
    Lee, Jason D.
    Arora, Sanjeev
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [46] Estimating the Principal Eigenvector of a Stochastic Matrix: Mirror Descent Algorithms via Game Approach with Application to PageRank Problem
    Nazin, Alexander
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 792 - 797
  • [47] Is Stochastic Mirror Descent Vulnerable to Adversarial Delay Attacks? A Traffic Assignment Resilience Study
    Pan, Yunian
    Li, Tao
    Zhu, Quanyan
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 8328 - 8333
  • [48] Is Stochastic Mirror Descent Vulnerable to Adversarial Delay Attacks? A Traffic Assignment Resilience Study
    Pan, Yunian
    Li, Tao
    Zhu, Quanyan
    arXiv, 2023,
  • [49] Is Stochastic Mirror Descent Vulnerable to Adversarial Delay Attacks? A Traffic Assignment Resilience Study
    Pan, Yunian
    Li, Tao
    Zhu, Quanyan
    Proceedings of the IEEE Conference on Decision and Control, 2023, : 8328 - 8333
  • [50] Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm
    Zhu, Miaoxi
    Shen, Li
    Du, Bo
    Tao, Dacheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,