Exploring Strategies for Training Deep Neural Networks

被引:0
|
作者
Larochelle, Hugo [1 ]
Bengio, Yoshua [1 ]
Louradour, Jerome [1 ]
Lamblin, Pascal [1 ]
机构
[1] Univ Montreal, Dept Informat & Rech Operat, Montreal, PQ H3T 1J8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
artificial neural networks; deep belief networks; restricted Boltzmann machines; autoassociators; unsupervised learning; COMPONENT ANALYSIS; BLIND SEPARATION; DIMENSIONALITY; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM) to initialize the parameters of a deep belief network (DBN), a generative model with many layers of hidden causal variables. This was followed by the proposal of another greedy layer-wise procedure, relying on the usage of autoassociator networks. In the context of the above optimization problem, we study these algorithms empirically to better understand their success. Our experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. We also present a series of experiments aimed at evaluating the link between the performance of deep neural networks and practical aspects of their topology, for example, demonstrating cases where the addition of more depth helps. Finally, we empirically explore simple variants of these training algorithms, such as the use of different RBM input unit distributions, a simple way of combining gradient estimators to improve performance, as well as on-line versions of those algorithms.
引用
收藏
页码:1 / 40
页数:40
相关论文
共 50 条
  • [31] Training Strategies for Convolutional Neural Networks with Transformed Input
    Khandani, Masoumeh Kalantari
    Mikhael, Wasfy B.
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 1058 - 1061
  • [32] CONVOLUTIONAL NEURAL NETWORKS AND TRAINING STRATEGIES FOR SKIN DETECTION
    Kim, Yoonsik
    Hwang, Insung
    Cho, Nam Ik
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3919 - 3923
  • [33] A fast adaptive algorithm for training deep neural networks
    Gui, Yangting
    Li, Dequan
    Fang, Runyue
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4099 - 4108
  • [34] Efficient Incremental Training for Deep Convolutional Neural Networks
    Tao, Yudong
    Tu, Yuexuan
    Shyu, Mei-Ling
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 286 - 291
  • [35] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [36] An Efficient Optimization Technique for Training Deep Neural Networks
    Mehmood, Faisal
    Ahmad, Shabir
    Whangbo, Taeg Keun
    MATHEMATICS, 2023, 11 (06)
  • [37] A survey on parallel training algorithms for deep neural networks
    Yook, Dongsuk
    Lee, Hyowon
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (06): : 505 - 514
  • [38] A simple theory for training response of deep neural networks
    Nakazato, Kenichi
    PHYSICA SCRIPTA, 2024, 99 (06)
  • [39] Fluorescence microscopy datasets for training deep neural networks
    Hagen, Guy M.
    Bendesky, Justin
    Machado, Rosa
    Tram-Anh Nguyen
    Kumar, Tanmay
    Ventura, Jonathan
    GIGASCIENCE, 2021, 10 (05):
  • [40] SEQUENCE TRAINING AND ADAPTATION OF HIGHWAY DEEP NEURAL NETWORKS
    Lu, Liang
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 461 - 466