MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引：0

作者：

Wiesler, Simon ^{[1
]}

Richard, Alexander ^{[1
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

deep learning; optimization; speech recognition; LVCSR;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.

引用

页数：5

共 50 条

[31] Large-scale support vector regression with budgeted stochastic gradient descent
Zongxia Xie
Yingda Li
International Journal of Machine Learning and Cybernetics, 2019, 10 : 1529 - 1541
[32] Large-scale support vector regression with budgeted stochastic gradient descent
Xie, Zongxia
Li, Yingda
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (06) : 1529 - 1541
[33] Deep learning for the large-scale cancer data analysis
Tsuji, Shingo
Aburatani, Hiroyuki
CANCER RESEARCH, 2015, 75 (22)
[34] Deep Reinforcement Learning for Large-Scale Epidemic Control
Libin, Pieter J. K.
Moonens, Arno
Verstraeten, Timothy
Perez-Sanjines, Fabian
Hens, Niel
Lemey, Philippe
Nowe, Ann
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 155 - 170
[35] Deep learning large-scale drug discovery and repurposing
Yu, Min
Li, Weiming
Yu, Yunru
Zhao, Yu
Xiao, Lizhi
Lauschke, Volker M.
Cheng, Yiyu
Zhang, Xingcai
Wang, Yi
NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 600 - 614
[36] HammingMesh: A Network Topology for Large-Scale Deep Learning
Hoefler, Torsten
Bonato, Tommaso
De Sensi, Daniele
Di Girolamo, Salvatore
Li, Shigang
Heddes, Marco
Belk, Jon
Goel, Deepak
Castro, Miguel
Scott, Steve
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[37] HammingMesh: A Network Topology for Large-Scale Deep Learning
Hoefler, Torsten
Bonoto, Tommaso
De Sensi, Daniele
Di Girolamo, Salvatore
Li, Shigang
Heddes, Marco
Goel, Deepak
Castro, Miguel
Scott, Steve
COMMUNICATIONS OF THE ACM, 2024, 67 (12) : 97 - 105
[38] On Efficient Training of Large-Scale Deep Learning Models
Shen, Li
Sun, Yan
Yu, Zhiyuan
Ding, Liang
Tian, Xinmei
Tao, Dacheng
ACM COMPUTING SURVEYS, 2025, 57 (03)
[39] A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning
Mokhtari, Aryan
Koppel, Alec
Takac, Martin
Ribeiro, Alejandro
JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[40] DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition
Huang, Jingwei
Huang, Shan
Sun, Mingwei
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10303 - 10312

← 1 2 3 4 5 →