MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引:0
|
作者
Wiesler, Simon [1 ]
Richard, Alexander [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany
关键词
deep learning; optimization; speech recognition; LVCSR;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
    Ahn, Sungjin
    Korattikara, Anoop
    Liu, Nathan
    Rajan, Suju
    Welling, Max
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 9 - 18
  • [42] Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization
    Shen, Zebang
    Qian, Hui
    Mu, Tongzhou
    Zhang, Chao
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2715 - 2721
  • [43] Triply stochastic gradient method for large-scale nonlinear similar unlabeled classification
    Wanli Shi
    Bin Gu
    Xiang Li
    Cheng Deng
    Heng Huang
    Machine Learning, 2021, 110 : 2005 - 2033
  • [44] Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization
    Mu, Yadong
    Liu, Wei
    Liu, Xiaobai
    Fan, Wei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (02) : 458 - 471
  • [45] Triply stochastic gradient method for large-scale nonlinear similar unlabeled classification
    Shi, Wanli
    Gu, Bin
    Li, Xiang
    Deng, Cheng
    Huang, Heng
    MACHINE LEARNING, 2021, 110 (08) : 2005 - 2033
  • [46] Large-Scale Deep Learning for Building Intelligent Computer Systems
    Dean, Jeff
    PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, : 1 - 1
  • [47] Rich Punctuations Prediction Using Large-scale Deep Learning
    Wu, Xueyang
    Zhu, Su
    Wu, Yue
    Yu, Kai
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [48] Large-Scale Mobile App Identification Using Deep Learning
    Rezaei, Shahbaz
    Kroencke, Bryce
    Liu, Xin
    IEEE ACCESS, 2020, 8 : 348 - 362
  • [49] Hybrid Beamforming With Deep Learning for Large-Scale Antenna Arrays
    Hu, Rentao
    Jiang, Lijun
    Li, Ping
    IEEE ACCESS, 2021, 9 : 54690 - 54699
  • [50] Automatic Graph Partitioning for Very Large-scale Deep Learning
    Tanaka, Masahiro
    Taura, Kenjiro
    Hanawa, Toshihiro
    Torisawa, Kentaro
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 1004 - 1013