Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

被引：15

作者：

Xue, Shaofei ^{[1
]}

Jiang, Hui ^{[2
]}

Dai, Lirong ^{[1
]}

Liu, Qingfeng ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China

[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 2R7, Canada

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2016年 / 82卷 / 02期

关键词：

Deep neural network (DNN); Hybrid DNN/HMM; Speaker adaptation; Singular value decomposition (SVD); TRANSFORMATIONS;

D O I：

10.1007/s11265-015-1012-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker.

引用

页码：175 / 185

页数：11

共 50 条

[41] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
Wu, Yi-Jian
King, Simon
Tokuda, Keiichi
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
[42] Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination
Cerva, Petr
Nouza, Jan
Silovsky, Jan
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2326 - 2329
[43] Speaker Independent Urdu Speech Recognition Using HMM
Ashraf, Javed
Iqbal, Naveed
Khattak, Naveed Sarfraz
Zaidi, Ather Mohsin
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 6177 : 140 - 148
[44] PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION
COX, S
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01): : 1 - 17
[45] Improved hybrid ANN/HMM model applies in speech recognition
Xi, XJ
Lin, KH
ISTM/2005: 6th International Symposium on Test and Measurement, Vols 1-9, Conference Proceedings, 2005, : 1373 - 1376
[46] HMM/ANN hybrid model for continuous Malayalam speech recognition
Mohamed, Anuj
Nair, K. N. Ramachandran
INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND SYSTEM DESIGN 2011, 2012, 30 : 616 - 622
[47] A hybrid HMM/BN acoustic model for automatic speech recognition
Markov, K
Nakamura, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 438 - 445
[48] HMM-based integrated method for speaker-independent speech recognition
Tsinghua Univ, Beijing, China
Int Conf Signal Process Proc, (613-616):
[49] SPEAKER ADAPTATION OF RNN-BLSTM FOR SPEECH RECOGNITION BASED ON SPEAKER CODE
Huang, Zhiying
Tang, Jian
Xue, Shaofei
Dai, Lirong
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5305 - 5309
[50] A HMM-based integrated method for speaker-independent speech recognition
Zhang, YY
Zhu, XY
ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 613 - 616

← 1 2 3 4 5 →