Gradient Starvation: A Learning Proclivity in Neural Networks

被引：0

作者：

Pezeshki, Mohammad ^{[1
,2
]}

Kaba, Sekou-Oumar ^{[1
,3
]}

Bengio, Yoshua ^{[1
,2
]}

Courville, Aaron ^{[1
,2
]}

Precup, Doina ^{[1
,3
,4
]}

Lajoie, Guillaume ^{[1
,2
]}

机构：

[1] Mila, Montreal, PQ, Canada

[2] Univ Montreal, Montreal, PQ, Canada

[3] McGill Univ, Montreal, PQ, Canada

[4] Google DeepMind, London, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We identify and formalize a fundamental gradient descent phenomenon leading to a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalances in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel but simple regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and realworld out-of-distribution (OOD) generalization experiments.

引用

页数：17

共 50 条

[1] Enhanced gradient learning for deep neural networks
Yan, Ming
Yang, Jianxi
Chen, Cen
Zhou, Joey Tianyi
Pan, Yi
Zeng, Zeng
IET IMAGE PROCESSING, 2022, 16 (02) : 365 - 377
[2] The natural gradient learning algorithm for neural networks
Amari, S
THEORETICAL ASPECTS OF NEURAL COMPUTATION: A MULTIDISCIPLINARY PERSPECTIVE, 1998, : 1 - 15
[3] Learning Graph Neural Networks with Approximate Gradient Descent
Li, Qunwei
Zou, Shaofeng
Zhong, Wenliang
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8438 - 8446
[4] A conjugate gradient learning algorithm for recurrent neural networks
Chang, WF
Mak, MW
NEUROCOMPUTING, 1999, 24 (1-3) : 173 - 189
[5] Gradient descent learning for quaternionic Hopfield neural networks
Kobayashi, Masaki
NEUROCOMPUTING, 2017, 260 : 174 - 179
[6] Gradient and Hamiltonian dynamics applied to learning in neural networks
Howse, JW
Abdallah, CT
Heileman, GL
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 274 - 280
[7] A gradient descent learning algorithm for fuzzy neural networks
Feuring, T
Buckley, JJ
Hayashi, Y
1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1136 - 1141
[8] Convergence of gradient descent for learning linear neural networks
Nguegnang, Gabin Maxime
Rauhut, Holger
Terstiege, Ulrich
ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2024, 2024 (01):
[9] Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks
Neftci, Emre O.
Mostafa, Hesham
Zenke, Friedemann
IEEE SIGNAL PROCESSING MAGAZINE, 2019, 36 (06) : 51 - 63
[10] Smooth Exact Gradient Descent Learning in Spiking Neural Networks
Klos, Christian
Memmesheimer, Raoul-Martin
PHYSICAL REVIEW LETTERS, 2025, 134 (02)

← 1 2 3 4 5 →