Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

被引：15

作者：

Xue, Shaofei ^{[1
]}

Jiang, Hui ^{[2
]}

Dai, Lirong ^{[1
]}

Liu, Qingfeng ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China

[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 2R7, Canada

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2016年 / 82卷 / 02期

关键词：

Deep neural network (DNN); Hybrid DNN/HMM; Speaker adaptation; Singular value decomposition (SVD); TRANSFORMATIONS;

D O I：

10.1007/s11265-015-1012-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker.

引用

页码：175 / 185

页数：11

共 50 条

[31] HMM-separation-based speech recognition for a distant moving speaker
Takiguchi, T
Nakamura, S
Shikano, K
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02): : 127 - 140
[32] Speech recognition for a distant moving speaker based on HMM composition and separation
Takiguchi, T
Nakamura, S
Shikano, K
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1403 - 1406
[33] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
Natl Tsing Hua Univ, Hsinchu, Taiwan
IEE Proc Vision Image Signal Proc, 3 (129-135):
[34] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
Chien, JT
Wang, HC
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1997, 144 (03): : 129 - 135
[35] Speaker adaptation method for fenonic Markov model-based speech recognition
Nishimura, Masafumi, 1600, (22):
[36] Speech emotion recognition based on a hybrid of HMM/ANN
Mao, Xia
Zhang, Bing
Luo, Yi
PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATIONS, 2007, : 369 - 372
[37] Face Recognition Using Singular Value Decomposition along with seven state HMM
Shinde, Anagha A.
Ruikar, Sachin D.
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (12): : 117 - 122
[38] Very low bit rate speech coding based on HMM with speaker adaptation
Masuko, Takashi
Kobayashi, Takao
Tokuda, Keiichi
Systems and Computers in Japan, 2006, 37 (02): : 67 - 78
[39] An On-line Speaker Adaptation Method for HMM-based Speech Recognizers
Banhalmi, Andras
Kocsor, Andras
ACTA CYBERNETICA, 2008, 18 (03): : 379 - 390
[40] Nearest Neighbor Approach in Speaker Adaptation for HMM-based Speech Synthesis
Mohammadi, Amir
Demiroglu, Cenk
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,

← 1 2 3 4 5 →