MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引：0

作者：

Wiesler, Simon ^{[1
]}

Richard, Alexander ^{[1
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

deep learning; optimization; speech recognition; LVCSR;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.

引用

页数：5

共 50 条

[41] Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
Ahn, Sungjin
Korattikara, Anoop
Liu, Nathan
Rajan, Suju
Welling, Max
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 9 - 18
[42] Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization
Shen, Zebang
Qian, Hui
Mu, Tongzhou
Zhang, Chao
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2715 - 2721
[43] Triply stochastic gradient method for large-scale nonlinear similar unlabeled classification
Wanli Shi
Bin Gu
Xiang Li
Cheng Deng
Heng Huang
Machine Learning, 2021, 110 : 2005 - 2033
[44] Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization
Mu, Yadong
Liu, Wei
Liu, Xiaobai
Fan, Wei
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (02) : 458 - 471
[45] Triply stochastic gradient method for large-scale nonlinear similar unlabeled classification
Shi, Wanli
Gu, Bin
Li, Xiang
Deng, Cheng
Huang, Heng
MACHINE LEARNING, 2021, 110 (08) : 2005 - 2033
[46] Large-Scale Deep Learning for Building Intelligent Computer Systems
Dean, Jeff
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, : 1 - 1
[47] Rich Punctuations Prediction Using Large-scale Deep Learning
Wu, Xueyang
Zhu, Su
Wu, Yue
Yu, Kai
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[48] Large-Scale Mobile App Identification Using Deep Learning
Rezaei, Shahbaz
Kroencke, Bryce
Liu, Xin
IEEE ACCESS, 2020, 8 : 348 - 362
[49] Hybrid Beamforming With Deep Learning for Large-Scale Antenna Arrays
Hu, Rentao
Jiang, Lijun
Li, Ping
IEEE ACCESS, 2021, 9 : 54690 - 54699
[50] Automatic Graph Partitioning for Very Large-scale Deep Learning
Tanaka, Masahiro
Taura, Kenjiro
Hanawa, Toshihiro
Torisawa, Kentaro
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 1004 - 1013

← 1 2 3 4 5 →