MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引:0
|
作者
Wiesler, Simon [1 ]
Richard, Alexander [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany
关键词
deep learning; optimization; speech recognition; LVCSR;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Large-scale support vector regression with budgeted stochastic gradient descent
    Zongxia Xie
    Yingda Li
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 1529 - 1541
  • [32] Large-scale support vector regression with budgeted stochastic gradient descent
    Xie, Zongxia
    Li, Yingda
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (06) : 1529 - 1541
  • [33] Deep learning for the large-scale cancer data analysis
    Tsuji, Shingo
    Aburatani, Hiroyuki
    CANCER RESEARCH, 2015, 75 (22)
  • [34] Deep Reinforcement Learning for Large-Scale Epidemic Control
    Libin, Pieter J. K.
    Moonens, Arno
    Verstraeten, Timothy
    Perez-Sanjines, Fabian
    Hens, Niel
    Lemey, Philippe
    Nowe, Ann
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 155 - 170
  • [35] Deep learning large-scale drug discovery and repurposing
    Yu, Min
    Li, Weiming
    Yu, Yunru
    Zhao, Yu
    Xiao, Lizhi
    Lauschke, Volker M.
    Cheng, Yiyu
    Zhang, Xingcai
    Wang, Yi
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 600 - 614
  • [36] HammingMesh: A Network Topology for Large-Scale Deep Learning
    Hoefler, Torsten
    Bonato, Tommaso
    De Sensi, Daniele
    Di Girolamo, Salvatore
    Li, Shigang
    Heddes, Marco
    Belk, Jon
    Goel, Deepak
    Castro, Miguel
    Scott, Steve
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [37] HammingMesh: A Network Topology for Large-Scale Deep Learning
    Hoefler, Torsten
    Bonoto, Tommaso
    De Sensi, Daniele
    Di Girolamo, Salvatore
    Li, Shigang
    Heddes, Marco
    Goel, Deepak
    Castro, Miguel
    Scott, Steve
    COMMUNICATIONS OF THE ACM, 2024, 67 (12) : 97 - 105
  • [38] On Efficient Training of Large-Scale Deep Learning Models
    Shen, Li
    Sun, Yan
    Yu, Zhiyuan
    Ding, Liang
    Tian, Xinmei
    Tao, Dacheng
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [39] A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning
    Mokhtari, Aryan
    Koppel, Alec
    Takac, Martin
    Ribeiro, Alejandro
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [40] DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition
    Huang, Jingwei
    Huang, Shan
    Sun, Mingwei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10303 - 10312