MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引：0

作者：

Wiesler, Simon ^{[1
]}

Richard, Alexander ^{[1
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

deep learning; optimization; speech recognition; LVCSR;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.

引用

页数：5

共 50 条

[1] Large-Scale Machine Learning with Stochastic Gradient Descent
Bottou, Leon
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
[2] Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning
Yang, Zhuang
IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1598 - 1606
[3] Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning
Yang, Zhuang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14645 - 14658
[4] Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Osawa, Kazuki
Tsuji, Yohei
Ueno, Yuichiro
Naruse, Akira
Foo, Chuan-Sheng
Yokota, Rio
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) : 404 - 415
[5] Large-scale machine learning with fast and stable stochastic conjugate gradient
Yang, Zhuang
COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 173
[6] Optimal large-scale stochastic optimization of NDCG surrogates for deep learning
Qiu, Zi-Hao
Hu, Quanqi
Zhong, Yongjian
Tu, Wei-Wei
Zhang, Lijun
Yang, Tianbao
MACHINE LEARNING, 2025, 114 (02)
[7] The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems
Yang, Zhuang
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
[8] Value function gradient learning for large-scale multistage stochastic programming problems
Lee, Jinkyu
Bae, Sanghyeon
Kim, Woo Chang
Lee, Yongjae
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
[9] Distributing the Stochastic Gradient Sampler for Large-Scale LDA
Yang, Yuan
Chen, Jianfei
Zhu, Jun
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1975 - 1984
[10] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
Yang, Zhuang
APPLIED INTELLIGENCE, 2023, 53 (23) : 28627 - 28641

← 1 2 3 4 5 →