Exploring Strategies for Training Deep Neural Networks

被引:0
|
作者
Larochelle, Hugo [1 ]
Bengio, Yoshua [1 ]
Louradour, Jerome [1 ]
Lamblin, Pascal [1 ]
机构
[1] Univ Montreal, Dept Informat & Rech Operat, Montreal, PQ H3T 1J8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
artificial neural networks; deep belief networks; restricted Boltzmann machines; autoassociators; unsupervised learning; COMPONENT ANALYSIS; BLIND SEPARATION; DIMENSIONALITY; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM) to initialize the parameters of a deep belief network (DBN), a generative model with many layers of hidden causal variables. This was followed by the proposal of another greedy layer-wise procedure, relying on the usage of autoassociator networks. In the context of the above optimization problem, we study these algorithms empirically to better understand their success. Our experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. We also present a series of experiments aimed at evaluating the link between the performance of deep neural networks and practical aspects of their topology, for example, demonstrating cases where the addition of more depth helps. Finally, we empirically explore simple variants of these training algorithms, such as the use of different RBM input unit distributions, a simple way of combining gradient estimators to improve performance, as well as on-line versions of those algorithms.
引用
收藏
页码:1 / 40
页数:40
相关论文
共 50 条
  • [41] ON TRAINING DEEP NEURAL NETWORKS USING A STREAMING APPROACH
    Duda, Piotr
    Jaworski, Maciej
    Cader, Andrzej
    Wang, Lipo
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2020, 10 (01) : 15 - 26
  • [42] Training Deep Neural Networks in Situ with Neuromorphic Photonics
    Filipovich, Matthew J.
    Guo, Zhimu
    Marquez, Bicky A.
    Morison, Hugh D.
    Shastri, Bhavin J.
    2020 IEEE PHOTONICS CONFERENCE (IPC), 2020,
  • [43] Stability for the training of deep neural networks and other classifiers
    Berlyand, Leonid
    Jabin, Pierre-Emmanuel
    Safsten, C. Alex
    MATHEMATICAL MODELS & METHODS IN APPLIED SCIENCES, 2021, 31 (11): : 2345 - 2390
  • [44] An Exploration on Temperature Term in Training Deep Neural Networks
    Si, Zhaofeng
    Qi, Honggang
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [45] Partial data permutation for training deep neural networks
    Cong, Guojing
    Zhang, Li
    Yang, Chih-Chieh
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 728 - 735
  • [46] Disentangling feature and lazy training in deep neural networks
    Geiger, Mario
    Spigler, Stefano
    Jacot, Arthur
    Wyart, Matthieu
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2020, 2020 (11):
  • [47] Training Deep Spiking Neural Networks Using Backpropagation
    Lee, Jun Haeng
    Delbruck, Tobi
    Pfeiffer, Michael
    FRONTIERS IN NEUROSCIENCE, 2016, 10
  • [48] Partial Differential Equations for Training Deep Neural Networks
    Chaudhari, Pratik
    Oberman, Adam
    Osher, Stanley
    Soatto, Stefano
    Carlier, Guillaume
    2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, : 1627 - 1631
  • [49] Accelerating Training for Distributed Deep Neural Networks in MapReduce
    Xu, Jie
    Wang, Jingyu
    Qi, Qi
    Sun, Haifeng
    Liao, Jianxin
    WEB SERVICES - ICWS 2018, 2018, 10966 : 181 - 195
  • [50] Decentralized trustless gossip training of deep neural networks
    Sajina, Robert
    Tankovic, Nikola
    Etinger, Darko
    2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020), 2020, : 1080 - 1084