Exploring Strategies for Training Deep Neural Networks

被引:0
|
作者
Larochelle, Hugo [1 ]
Bengio, Yoshua [1 ]
Louradour, Jerome [1 ]
Lamblin, Pascal [1 ]
机构
[1] Univ Montreal, Dept Informat & Rech Operat, Montreal, PQ H3T 1J8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
artificial neural networks; deep belief networks; restricted Boltzmann machines; autoassociators; unsupervised learning; COMPONENT ANALYSIS; BLIND SEPARATION; DIMENSIONALITY; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM) to initialize the parameters of a deep belief network (DBN), a generative model with many layers of hidden causal variables. This was followed by the proposal of another greedy layer-wise procedure, relying on the usage of autoassociator networks. In the context of the above optimization problem, we study these algorithms empirically to better understand their success. Our experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. We also present a series of experiments aimed at evaluating the link between the performance of deep neural networks and practical aspects of their topology, for example, demonstrating cases where the addition of more depth helps. Finally, we empirically explore simple variants of these training algorithms, such as the use of different RBM input unit distributions, a simple way of combining gradient estimators to improve performance, as well as on-line versions of those algorithms.
引用
收藏
页码:1 / 40
页数:40
相关论文
共 50 条
  • [1] Exploring strategies for training deep neural networks
    Larochelle, Hugo
    Bengio, Yoshua
    Louradour, Jérôme
    Lamblin, Pascal
    Journal of Machine Learning Research, 2009, 10 : 1 - 40
  • [2] Exploring Learning Strategies for Training Deep Neural Networks Using Multiple Graphics Processing Units
    Hu, Nien-Tsu
    Huang, Ching-Chien
    Mo, Chih-Chieh
    Huang, Chien-Lin
    SENSORS AND MATERIALS, 2024, 36 (09) : 3743 - 3755
  • [3] SEMI-SUPERVISED TRAINING STRATEGIES FOR DEEP NEURAL NETWORKS
    Gibson, Matthew
    Cook, Gary
    Zhan, Puming
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 77 - 83
  • [4] Strategies for training optical neural networks
    Qipeng Yang
    Bowen Bai
    Weiwei Hu
    Xingjun Wang
    National Science Open, 2022, 1 (03) : 7 - 11
  • [5] MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
    Ghoshal, Arnab
    Swietojanski, Pawel
    Renals, Steve
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7319 - 7323
  • [6] Training deep quantum neural networks
    Beer, Kerstin
    Bondarenko, Dmytro
    Farrelly, Terry
    Osborne, Tobias J.
    Salzmann, Robert
    Scheiermann, Daniel
    Wolf, Ramona
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [7] Training deep quantum neural networks
    Kerstin Beer
    Dmytro Bondarenko
    Terry Farrelly
    Tobias J. Osborne
    Robert Salzmann
    Daniel Scheiermann
    Ramona Wolf
    Nature Communications, 11
  • [8] NOISY TRAINING FOR DEEP NEURAL NETWORKS
    Meng, Xiangtao
    Liu, Chao
    Zhang, Zhiyong
    Wang, Dong
    2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 16 - 20
  • [9] Exploring the Fundamentals of Mutations in Deep Neural Networks
    Ahmed, Zaheed
    Makedonski, Philip
    ACM/IEEE 27TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS: COMPANION PROCEEDINGS, MODELS 2024, 2024, : 227 - 233
  • [10] Exploring deep neural networks for rumor detection
    Muhammad Zubair Asghar
    Ammara Habib
    Anam Habib
    Adil Khan
    Rehman Ali
    Asad Khattak
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 4315 - 4333