Gradient Starvation: A Learning Proclivity in Neural Networks

被引:0
|
作者
Pezeshki, Mohammad [1 ,2 ]
Kaba, Sekou-Oumar [1 ,3 ]
Bengio, Yoshua [1 ,2 ]
Courville, Aaron [1 ,2 ]
Precup, Doina [1 ,3 ,4 ]
Lajoie, Guillaume [1 ,2 ]
机构
[1] Mila, Montreal, PQ, Canada
[2] Univ Montreal, Montreal, PQ, Canada
[3] McGill Univ, Montreal, PQ, Canada
[4] Google DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We identify and formalize a fundamental gradient descent phenomenon leading to a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalances in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel but simple regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and realworld out-of-distribution (OOD) generalization experiments.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Enhanced gradient learning for deep neural networks
    Yan, Ming
    Yang, Jianxi
    Chen, Cen
    Zhou, Joey Tianyi
    Pan, Yi
    Zeng, Zeng
    IET IMAGE PROCESSING, 2022, 16 (02) : 365 - 377
  • [2] The natural gradient learning algorithm for neural networks
    Amari, S
    THEORETICAL ASPECTS OF NEURAL COMPUTATION: A MULTIDISCIPLINARY PERSPECTIVE, 1998, : 1 - 15
  • [3] Learning Graph Neural Networks with Approximate Gradient Descent
    Li, Qunwei
    Zou, Shaofeng
    Zhong, Wenliang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8438 - 8446
  • [4] A conjugate gradient learning algorithm for recurrent neural networks
    Chang, WF
    Mak, MW
    NEUROCOMPUTING, 1999, 24 (1-3) : 173 - 189
  • [5] Gradient descent learning for quaternionic Hopfield neural networks
    Kobayashi, Masaki
    NEUROCOMPUTING, 2017, 260 : 174 - 179
  • [6] Gradient and Hamiltonian dynamics applied to learning in neural networks
    Howse, JW
    Abdallah, CT
    Heileman, GL
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 274 - 280
  • [7] A gradient descent learning algorithm for fuzzy neural networks
    Feuring, T
    Buckley, JJ
    Hayashi, Y
    1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1136 - 1141
  • [8] Convergence of gradient descent for learning linear neural networks
    Nguegnang, Gabin Maxime
    Rauhut, Holger
    Terstiege, Ulrich
    ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2024, 2024 (01):
  • [9] Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks
    Neftci, Emre O.
    Mostafa, Hesham
    Zenke, Friedemann
    IEEE SIGNAL PROCESSING MAGAZINE, 2019, 36 (06) : 51 - 63
  • [10] Smooth Exact Gradient Descent Learning in Spiking Neural Networks
    Klos, Christian
    Memmesheimer, Raoul-Martin
    PHYSICAL REVIEW LETTERS, 2025, 134 (02)