MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引:0
|
作者
Wiesler, Simon [1 ]
Richard, Alexander [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany
关键词
deep learning; optimization; speech recognition; LVCSR;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [2] Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning
    Yang, Zhuang
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1598 - 1606
  • [3] Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning
    Yang, Zhuang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14645 - 14658
  • [4] Scalable and Practical Natural Gradient for Large-Scale Deep Learning
    Osawa, Kazuki
    Tsuji, Yohei
    Ueno, Yuichiro
    Naruse, Akira
    Foo, Chuan-Sheng
    Yokota, Rio
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) : 404 - 415
  • [5] Large-scale machine learning with fast and stable stochastic conjugate gradient
    Yang, Zhuang
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 173
  • [6] Optimal large-scale stochastic optimization of NDCG surrogates for deep learning
    Qiu, Zi-Hao
    Hu, Quanqi
    Zhong, Yongjian
    Tu, Wei-Wei
    Zhang, Lijun
    Yang, Tianbao
    MACHINE LEARNING, 2025, 114 (02)
  • [7] The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems
    Yang, Zhuang
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [8] Value function gradient learning for large-scale multistage stochastic programming problems
    Lee, Jinkyu
    Bae, Sanghyeon
    Kim, Woo Chang
    Lee, Yongjae
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
  • [9] Distributing the Stochastic Gradient Sampler for Large-Scale LDA
    Yang, Yuan
    Chen, Jianfei
    Zhu, Jun
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1975 - 1984
  • [10] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
    Yang, Zhuang
    APPLIED INTELLIGENCE, 2023, 53 (23) : 28627 - 28641