Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

被引:15
|
作者
Xue, Shaofei [1 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
Liu, Qingfeng [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China
[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 2R7, Canada
关键词
Deep neural network (DNN); Hybrid DNN/HMM; Speaker adaptation; Singular value decomposition (SVD); TRANSFORMATIONS;
D O I
10.1007/s11265-015-1012-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker.
引用
收藏
页码:175 / 185
页数:11
相关论文
共 50 条
  • [31] HMM-separation-based speech recognition for a distant moving speaker
    Takiguchi, T
    Nakamura, S
    Shikano, K
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02): : 127 - 140
  • [32] Speech recognition for a distant moving speaker based on HMM composition and separation
    Takiguchi, T
    Nakamura, S
    Shikano, K
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1403 - 1406
  • [33] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
    Natl Tsing Hua Univ, Hsinchu, Taiwan
    IEE Proc Vision Image Signal Proc, 3 (129-135):
  • [34] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
    Chien, JT
    Wang, HC
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1997, 144 (03): : 129 - 135
  • [36] Speech emotion recognition based on a hybrid of HMM/ANN
    Mao, Xia
    Zhang, Bing
    Luo, Yi
    PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATIONS, 2007, : 369 - 372
  • [37] Face Recognition Using Singular Value Decomposition along with seven state HMM
    Shinde, Anagha A.
    Ruikar, Sachin D.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (12): : 117 - 122
  • [38] Very low bit rate speech coding based on HMM with speaker adaptation
    Masuko, Takashi
    Kobayashi, Takao
    Tokuda, Keiichi
    Systems and Computers in Japan, 2006, 37 (02): : 67 - 78
  • [39] An On-line Speaker Adaptation Method for HMM-based Speech Recognizers
    Banhalmi, Andras
    Kocsor, Andras
    ACTA CYBERNETICA, 2008, 18 (03): : 379 - 390
  • [40] Nearest Neighbor Approach in Speaker Adaptation for HMM-based Speech Synthesis
    Mohammadi, Amir
    Demiroglu, Cenk
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,