Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

被引:0
|
作者
Li, Zhiyuan [1 ]
Wang, Tianhao [2 ]
Lee, Jason D. [1 ]
Arora, Sanjeev [1 ]
机构
[1] Princeton Univ, Princeton, NJ 08540 USA
[2] Yale Univ, New Haven, CT 06511 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting. It is shown that gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization. The latter result relies upon Nash's embedding theorem.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] The Implicit Bias of Gradient Descent on Separable Data
    Soudry, Daniel
    Hoffer, Elad
    Nacson, Mor Shpigel
    Gunasekar, Suriya
    Srebro, Nathan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [2] The implicit bias of gradient descent on nonseparable data
    Ji, Ziwei
    Telgarsky, Matus
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [3] On the Implicit Bias of Gradient Descent for Temporal Extrapolation
    Cohen-Karlik, Edo
    Ben David, Avichai
    Cohen, Nadav
    Globerson, Amir
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [4] The implicit bias of gradient descent on separable data
    Soudry, Daniel
    Hoffer, Elad
    Nacson, Mor Shpigel
    Gunasekar, Suriya
    Srebro, Nathan
    Journal of Machine Learning Research, 2018, 19
  • [5] Reparameterizing Mirror Descent as Gradient Descent
    Amid, Ehsan
    Warmuth, Manfred K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Implicit Bias of Gradient Descent on Linear Convolutional Networks
    Gunasekar, Suriya
    Lee, Jason D.
    Soudry, Daniel
    Srebro, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
    Azulay, Shahar
    Moroshko, Edward
    Nacson, Mor Shpigel
    Woodworth, Blake
    Srebro, Nathan
    Globerson, Amir
    Soudry, Daniel
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [8] Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
    Wu, Jingfeng
    Braverman, Vladimir
    Lee, Jason D.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models
    Wang, Li
    Fu, Zhiguo
    Zhou, Yingcong
    Yan, Zili
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10149 - 10156
  • [10] Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network
    Lyu, Bochen
    Zhu, Zhanxing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,