A STUDY OF GENERALIZATION OF STOCHASTIC MIRROR DESCENT ALGORITHMS ON OVERPARAMETERIZED NONLINEAR MODELS

被引:0
|
作者
Azizan, Navid [1 ]
Lale, Sahin [1 ]
Hassibi, Babak [1 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
关键词
Stochastic mirror descent; nonlinear models; convergence; implicit regularization; generalization;
D O I
10.1109/icassp40776.2020.9053864
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overparameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using sufficiently small step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with l(10) norm potential (as a surrogate for l(infinity)) consistently generalizes better than SGD (corresponding to an l(2) norm potential), which in turn consistently outperforms SMD with l(1) norm potential.
引用
收藏
页码:3132 / 3136
页数:5
相关论文
共 50 条
  • [1] Stochastic Mirror Descent on Overparameterized Nonlinear Models
    Azizan, Navid
    Lale, Sahin
    Hassibi, Babak
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) : 7717 - 7727
  • [2] An Empirical Study on Compressed Decentralized Stochastic Gradient Algorithms with Overparameterized Models
    Rao, Arjun Ashok
    Wai, Hoi-To
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 337 - 343
  • [3] Stochastic incremental mirror descent algorithms with Nesterov smoothing
    Bitterlich, Sandy
    Grad, Sorin-Mihai
    NUMERICAL ALGORITHMS, 2024, 95 (01) : 351 - 382
  • [4] A CHARACTERIZATION OF STOCHASTIC MIRROR DESCENT ALGORITHMS AND THEIR CONVERGENCE PROPERTIES
    Azizan, Navid
    Hassibi, Babak
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5167 - 5171
  • [5] Stochastic incremental mirror descent algorithms with Nesterov smoothing
    Sandy Bitterlich
    Sorin-Mihai Grad
    Numerical Algorithms, 2024, 95 : 351 - 382
  • [6] Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method
    A. V. Nazin
    A. S. Nemirovsky
    A. B. Tsybakov
    A. B. Juditsky
    Automation and Remote Control, 2019, 80 : 1607 - 1627
  • [7] Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method
    Nazin, A., V
    Nemirovsky, A. S.
    Tsybakov, A. B.
    Juditsky, A. B.
    AUTOMATION AND REMOTE CONTROL, 2019, 80 (09) : 1607 - 1627
  • [8] Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
    Nazin, A. V.
    AUTOMATION AND REMOTE CONTROL, 2018, 79 (01) : 78 - 88
  • [9] Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
    A. V. Nazin
    Automation and Remote Control, 2018, 79 : 78 - 88
  • [10] Generalization for Multiclass Classification with Overparameterized Linear Models
    Subramanian, Vignesh
    Arya, Rahul
    Sahai, Anant
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,