Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

被引:0
|
作者
Li, Zhiyuan [1 ]
Wang, Tianhao [2 ]
Lee, Jason D. [1 ]
Arora, Sanjeev [1 ]
机构
[1] Princeton Univ, Princeton, NJ 08540 USA
[2] Yale Univ, New Haven, CT 06511 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting. It is shown that gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization. The latter result relies upon Nash's embedding theorem.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Stochastic Mirror Descent on Overparameterized Nonlinear Models
    Azizan, Navid
    Lale, Sahin
    Hassibi, Babak
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) : 7717 - 7727
  • [22] Learning to Learn without Gradient Descent by Gradient Descent
    Chen, Yutian
    Hoffman, Matthew W.
    Colmenarejo, Sergio Gomez
    Denil, Misha
    Lillicrap, Timothy P.
    Botvinick, Matt
    de Freitas, Nando
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [23] Learning a Single Neuron with Bias Using Gradient Descent
    Vardi, Gal
    Yehudai, Gilad
    Shamir, Ohad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Scalable statistical inference for averaged implicit stochastic gradient descent
    Fang, Yixin
    SCANDINAVIAN JOURNAL OF STATISTICS, 2019, 46 (04) : 987 - 1002
  • [25] STOCHASTIC GRADIENT DESCENT FOR SPECTRAL EMBEDDING WITH IMPLICIT ORTHOGONALITY CONSTRAINT
    El Gheche, Mireille
    Chierchia, Giovanni
    Frossard, Pascal
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3567 - 3571
  • [26] Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent
    Liu, Weiming
    Jiang, Huacong
    Li, Bin
    Li, Houqiang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [27] ERGODIC MIRROR DESCENT
    Duchi, John C.
    Agarwal, Alekh
    Johansson, Mikael
    Jordan, Michael I.
    SIAM JOURNAL ON OPTIMIZATION, 2012, 22 (04) : 1549 - 1578
  • [28] Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks
    Jin, Hui
    Montufar, Guido
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [29] Towards Learning Stochastic Population Models by Gradient Descent
    Kreikemeyer, Justin N.
    Andelfinger, Philipp
    Uhrmacher, Adelinde M.
    PROCEEDINGS OF THE 38TH ACM SIGSIM INTERNATIONAL CONFERENCE ON PRINCIPLES OF ADVANCED DISCRETE SIMULATION, ACM SIGSIM-PADS 2024, 2024, : 88 - 92
  • [30] Gradient descent method
    Yingchao, Liu
    Jiyuan, Zhang
    Huadong Gongxueyuan Xuebao/Journal of East China Institute of Technology, 1993, (02):