Enhanced gradient learning for deep neural networks

被引:0
|
作者
Yan, Ming [2 ]
Yang, Jianxi [1 ]
Chen, Cen [3 ]
Zhou, Joey Tianyi [2 ]
Pan, Yi [4 ]
Zeng, Zeng [3 ]
机构
[1] Chongqing Jiaotong Univ, AI Res Ctr, Sch Informat Sci & Engn, Chongqing, Peoples R China
[2] Age Sci Technol & Res, Inst High Performance Comp, Singapore, Singapore
[3] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore
[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing, Peoples R China
关键词
Circuit connections - Deep layer - Gradient flow - Gradient learning - Images processing - Large margins - Neural-networks - Shallowest layers - Training parameters - Transport systems;
D O I
10.1049/ipr2.12353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks have achieved great success in both computer vision and natural language processing tasks. How to improve the gradient flows is crucial in training very deep neural networks. To address this challenge, a gradient enhancement approach is proposed through constructing the short circuit neural connections. The proposed short circuit is a unidirectional neural connection that back propagates the sensitivities rather than gradients in neural networks from the deep layers to the shallow layers. Moreover, the short circuit is further formulated as a gradient truncation operation in its connecting layers, which can be plugged into the backbone models without introducing extra training parameters. Extensive experiments demonstrate that the deep neural networks, with the help of short circuit connection, gain a large margin of improvement over the baselines on both computer vision and natural language processing tasks. The work provides the promising solution to the low-resource scenarios, such as, intelligence transport systems of computer vision, question answering of natural language processing.
引用
收藏
页码:365 / 377
页数:13
相关论文
共 50 条
  • [11] Online Deep Learning: Learning Deep Neural Networks on the Fly
    Sahoo, Doyen
    Pham, Quang
    Lu, Jing
    Hoi, Steven C. H.
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2660 - 2666
  • [12] Gradient Starvation: A Learning Proclivity in Neural Networks
    Pezeshki, Mohammad
    Kaba, Sekou-Oumar
    Bengio, Yoshua
    Courville, Aaron
    Precup, Doina
    Lajoie, Guillaume
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [13] The natural gradient learning algorithm for neural networks
    Amari, S
    THEORETICAL ASPECTS OF NEURAL COMPUTATION: A MULTIDISCIPLINARY PERSPECTIVE, 1998, : 1 - 15
  • [14] Learning with Deep Photonic Neural Networks
    Leelar, Bhawani Shankar
    Shivaleela, E. S.
    Srinivas, T.
    2017 IEEE WORKSHOP ON RECENT ADVANCES IN PHOTONICS (WRAP), 2017,
  • [15] Deep Learning with Random Neural Networks
    Gelenbe, Erol
    Yin, Yongha
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1633 - 1638
  • [16] Deep Learning with Random Neural Networks
    Gelenbe, Erol
    Yin, Yongha
    PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 2, 2018, 16 : 450 - 462
  • [17] Deep learning in spiking neural networks
    Tavanaei, Amirhossein
    Ghodrati, Masoud
    Kheradpisheh, Saeed Reza
    Masquelier, Timothee
    Maida, Anthony
    NEURAL NETWORKS, 2019, 111 : 47 - 63
  • [18] Deep learning in neural networks: An overview
    Schmidhuber, Juergen
    NEURAL NETWORKS, 2015, 61 : 85 - 117
  • [19] Artificial neural networks and deep learning
    Geubbelmans, Melvin
    Rousseau, Axel-Jan
    Burzykowski, Tomasz
    Valkenborg, Dirk
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2024, 165 (02) : 248 - 251
  • [20] Shortcut learning in deep neural networks
    Robert Geirhos
    Jörn-Henrik Jacobsen
    Claudio Michaelis
    Richard Zemel
    Wieland Brendel
    Matthias Bethge
    Felix A. Wichmann
    Nature Machine Intelligence, 2020, 2 : 665 - 673