Does Momentum Change the Implicit Regularization on Separable Data?

被引:0
|
作者
Wang, Bohan [1 ]
Meng, Qi [2 ]
Zhang, Huishuai [2 ]
Sun, Ruoyu [3 ]
Chen, Wei [4 ]
Ma, Zhi-Ming [4 ]
Liu, Tie-Yan [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Chinese Univ Hong Kong, Shenzhen, Peoples R China
[4] Chinese Acad Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies this problem by analyzing the implicit regularization of momentum-based optimization. We prove that on the linear classification problem with separable data and exponential-tailed loss, gradient descent with momentum (GDM) converges to the L-2 max-margin solution, which is the same as vanilla gradient descent. That means gradient descent with momentum acceleration still converges to a low-complexity model, which guarantees their generalization. We then analyze the stochastic and adaptive variants of GDM (i.e., SGDM and deterministic Adam) and show they also converge to the L-2 max-margin solution. Technically, the implicit regularization of SGDM is established based on a novel convergence analysis of SGDM under a general noise condition called affine noise variance condition. To the best of our knowledge, we are the first to derive SGDM's convergence under such an assumption. Numerical experiments are conducted to support our theoretical results.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Noncommutative Momentum and Torsional Regularization
    Nikodem Popławski
    Foundations of Physics, 2020, 50 : 900 - 923
  • [22] The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective
    Lin, Chi-Heng
    Kaushik, Chiraag
    Dyer, Eva L.
    Muthukumar, Vidya
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [23] Enhanced Specific Emitter Identification With Limited Data Through Dual Implicit Regularization
    Peng, Yang
    Zhang, Xile
    Guo, Lantu
    Ben, Cui
    Liu, Yuchao
    Wang, Yu
    Lin, Yun
    Gui, Guan
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (15): : 26395 - 26405
  • [24] Enhanced Sparsity by Non-Separable Regularization
    Selesnick, Ivan W.
    Bayram, Ilker
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (09) : 2298 - 2313
  • [25] Regularization with non-convex separable constraints
    Bredies, Kristian
    Lorenz, Dirk A.
    INVERSE PROBLEMS, 2009, 25 (08)
  • [26] Topology optimization with implicit functions and regularization
    Belytschko, T
    Xiao, SP
    Parimi, C
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2003, 57 (08) : 1177 - 1196
  • [27] AN IMPLICIT REGULARIZATION APPROACH TO CHIRAL MODELS
    Rosado, Ricardo J.C.
    Cherchiglia, Adriano
    Sampaio, Marcos
    Hiller, Brigitte
    Acta Physica Polonica B, Proceedings Supplement, 2024, 17 (06)
  • [28] Implicit Regularization of Random Feature Models
    Jacot, Arthur
    Simsek, Berfin
    Spadaro, Francesco
    Hongler, Clement
    Gabriel, Franck
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [29] Representational drift as a result of implicit regularization
    Ratzon, Aviv
    Derdikman, Dori
    Barak, Omri
    ELIFE, 2024, 12
  • [30] Implicit Geometric Regularization for Learning Shapes
    Gropp, Amos
    Yariv, Lior
    Haim, Niv
    Atzmon, Matan
    Lipman, Yaron
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,