Enhanced gradient learning for deep neural networks

被引：0

作者：

Yan, Ming ^{[2
]}

Yang, Jianxi ^{[1
]}

Chen, Cen ^{[3
]}

Zhou, Joey Tianyi ^{[2
]}

Pan, Yi ^{[4
]}

Zeng, Zeng ^{[3
]}

机构：

[1] Chongqing Jiaotong Univ, AI Res Ctr, Sch Informat Sci & Engn, Chongqing, Peoples R China

[2] Age Sci Technol & Res, Inst High Performance Comp, Singapore, Singapore

[3] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore

[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing, Peoples R China

来源：

IET IMAGE PROCESSING | 2022年 / 16卷 / 02期

关键词：

Circuit connections - Deep layer - Gradient flow - Gradient learning - Images processing - Large margins - Neural-networks - Shallowest layers - Training parameters - Transport systems;

D O I：

10.1049/ipr2.12353

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks have achieved great success in both computer vision and natural language processing tasks. How to improve the gradient flows is crucial in training very deep neural networks. To address this challenge, a gradient enhancement approach is proposed through constructing the short circuit neural connections. The proposed short circuit is a unidirectional neural connection that back propagates the sensitivities rather than gradients in neural networks from the deep layers to the shallow layers. Moreover, the short circuit is further formulated as a gradient truncation operation in its connecting layers, which can be plugged into the backbone models without introducing extra training parameters. Extensive experiments demonstrate that the deep neural networks, with the help of short circuit connection, gain a large margin of improvement over the baselines on both computer vision and natural language processing tasks. The work provides the promising solution to the low-resource scenarios, such as, intelligence transport systems of computer vision, question answering of natural language processing.

引用

页码：365 / 377

页数：13

共 50 条

[11] Online Deep Learning: Learning Deep Neural Networks on the Fly
Sahoo, Doyen
Pham, Quang
Lu, Jing
Hoi, Steven C. H.
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2660 - 2666
[12] Gradient Starvation: A Learning Proclivity in Neural Networks
Pezeshki, Mohammad
Kaba, Sekou-Oumar
Bengio, Yoshua
Courville, Aaron
Precup, Doina
Lajoie, Guillaume
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[13] The natural gradient learning algorithm for neural networks
Amari, S
THEORETICAL ASPECTS OF NEURAL COMPUTATION: A MULTIDISCIPLINARY PERSPECTIVE, 1998, : 1 - 15
[14] Learning with Deep Photonic Neural Networks
Leelar, Bhawani Shankar
Shivaleela, E. S.
Srinivas, T.
2017 IEEE WORKSHOP ON RECENT ADVANCES IN PHOTONICS (WRAP), 2017,
[15] Deep Learning with Random Neural Networks
Gelenbe, Erol
Yin, Yongha
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1633 - 1638
[16] Deep Learning with Random Neural Networks
Gelenbe, Erol
Yin, Yongha
PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 2, 2018, 16 : 450 - 462
[17] Deep learning in spiking neural networks
Tavanaei, Amirhossein
Ghodrati, Masoud
Kheradpisheh, Saeed Reza
Masquelier, Timothee
Maida, Anthony
NEURAL NETWORKS, 2019, 111 : 47 - 63
[18] Deep learning in neural networks: An overview
Schmidhuber, Juergen
NEURAL NETWORKS, 2015, 61 : 85 - 117
[19] Artificial neural networks and deep learning
Geubbelmans, Melvin
Rousseau, Axel-Jan
Burzykowski, Tomasz
Valkenborg, Dirk
AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2024, 165 (02) : 248 - 251
[20] Shortcut learning in deep neural networks
Robert Geirhos
Jörn-Henrik Jacobsen
Claudio Michaelis
Richard Zemel
Wieland Brendel
Matthias Bethge
Felix A. Wichmann
Nature Machine Intelligence, 2020, 2 : 665 - 673

← 1 2 3 4 5 →