Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

被引:15
|
作者
Xue, Shaofei [1 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
Liu, Qingfeng [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China
[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 2R7, Canada
关键词
Deep neural network (DNN); Hybrid DNN/HMM; Speaker adaptation; Singular value decomposition (SVD); TRANSFORMATIONS;
D O I
10.1007/s11265-015-1012-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker.
引用
收藏
页码:175 / 185
页数:11
相关论文
共 50 条
  • [41] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [42] Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination
    Cerva, Petr
    Nouza, Jan
    Silovsky, Jan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2326 - 2329
  • [43] Speaker Independent Urdu Speech Recognition Using HMM
    Ashraf, Javed
    Iqbal, Naveed
    Khattak, Naveed Sarfraz
    Zaidi, Ather Mohsin
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 6177 : 140 - 148
  • [44] PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION
    COX, S
    COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01): : 1 - 17
  • [45] Improved hybrid ANN/HMM model applies in speech recognition
    Xi, XJ
    Lin, KH
    ISTM/2005: 6th International Symposium on Test and Measurement, Vols 1-9, Conference Proceedings, 2005, : 1373 - 1376
  • [46] HMM/ANN hybrid model for continuous Malayalam speech recognition
    Mohamed, Anuj
    Nair, K. N. Ramachandran
    INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND SYSTEM DESIGN 2011, 2012, 30 : 616 - 622
  • [47] A hybrid HMM/BN acoustic model for automatic speech recognition
    Markov, K
    Nakamura, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 438 - 445
  • [48] HMM-based integrated method for speaker-independent speech recognition
    Tsinghua Univ, Beijing, China
    Int Conf Signal Process Proc, (613-616):
  • [49] SPEAKER ADAPTATION OF RNN-BLSTM FOR SPEECH RECOGNITION BASED ON SPEAKER CODE
    Huang, Zhiying
    Tang, Jian
    Xue, Shaofei
    Dai, Lirong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5305 - 5309
  • [50] A HMM-based integrated method for speaker-independent speech recognition
    Zhang, YY
    Zhu, XY
    ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 613 - 616