INCOHERENT TRAINING OF DEEP NEURAL NETWORKS TO DE-CORRELATE BOTTLENECK FEATURES FOR SPEECH RECOGNITION

被引：0

作者：

Bao, Yebo ^{[1
]}

Jiang, Hui ^{[2
]}

Dai, Lirong ^{[1
]}

Liu, Cong ^{[3
]}

机构：

[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230026, Anhui, Peoples R China

[2] Univ York, Dept Comp Sci & Engn, York YO10 5DD, N Yorkshire, England

[3] Anhui USTC iFlytek Co Ltd, iFlytek Res, Hefei, Peoples R China

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

Deep neural networks (DNN); nonlinear dimensionality reduction; bottleneck features; incoherent training; large vocabulary continuous speech recognition (LVCSR);

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, the hybrid model combining deep neural network (DNN) with context-dependent HMMs has achieved some dramatic gains over the conventional GMM/HMM method in many speech recognition tasks. In this paper, we study how to compete with the state-of-the-art DNN/HMM method under the traditional GMM/HMM framework. Instead of using DNN as acoustic model, we use DNN as a front-end bottleneck (BN) feature extraction method to de-correlate long feature vectors concatenated from several consecutive speech frames. More importantly, we have proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN. The first method relies on minimizing coherence of weight matrices in DNN while the second one attempts to minimize correlation coefficients of BN features calculated in each mini-batch data in DNN training. Experimental results on a 70-hr Mandarin transcription task and the 309-hr Switchboard task have shown that the traditional GMM/HMMs using BN features can yield comparable performance as DNN/HMM. The proposed incoherent training can produce 2-3% additional gain over the baseline BN features. At last, the discriminatively trained GMM/HMMs using incoherently trained BN features have consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.

引用

页码：6980 / 6984

页数：5

共 50 条

[31] Investigation of Bottleneck Features and Multilingual Deep Neural Networks for Speaker Verification
Tian, Yao
Cai, Meng
He, Liang
Liu, Jia
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1151 - 1155
[32] DEEP NEURAL NETWORK FEATURES AND SEMI-SUPERVISED TRAINING FOR LOW RESOURCE SPEECH RECOGNITION
Thomas, Samuel
Seltzer, Michael L.
Church, Kenneth
Hermansky, Hynek
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6704 - 6708
[33] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[34] RECURRENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Weng, Chao
Yu, Dong
Watanabe, Shinji
Juang, Biing-Hwang
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[35] Emotional Speech Recognition Using Deep Neural Networks
Trinh Van, Loan
Dao Thi Le, Thuy
Le Xuan, Thanh
Castelli, Eric
SENSORS, 2022, 22 (04)
[36] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
Ravanelli, Mirco
Brakel, Philemon
Omologo, Maurizio
Bengio, Yoshua
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
[37] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[38] INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 124 - 129
[39] Mongolian Speech Recognition Based on Deep Neural Networks
Zhang, Hui
Bao, Feilong
Gao, Guanglai
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
[40] On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
Zelinka, Jan
Salajka, Petr
Mueller, Ludek
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 301 - 308

← 1 2 3 4 5 →