GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models

被引:0
|
作者
Tomashenko, Natalia [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
关键词
speaker adaptation; deep neural networks (DNN); MAP; fMLLR; CD-DNN-HMM; GMM-derived (GMMD) features; speaker adaptive training (SAT);
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we investigate GMM-derived features recently introduced for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. We improve the previously proposed adaptation algorithm by applying the concept of speaker adaptive training (SAT) to DNNs built on GMM-derived features and by using fMLLR-adapted features for training an auxiliary GMM model. Traditional adaptation algorithms, such as maximum a posteriori adaptation (MAP) and feature space maximum likelihood linear regression (fMLLR) are performed for the auxiliary GMM model used in a SAT procedure for a DNN. Experimental results on the Wall Street Journal (WSJ0) corpus show that the proposed adaptation technique can provide, on average, a 17-28% relative word error rate (WER) reduction on different adaptation sets under an unsupervised adaptation setup, compared to speaker independent (SI) DNN-HMM systems built on conventional features. We found that fMLLR adaptation for the SAT DNN trained on GMM-derived features outperforms fMLLR adaptation for the SAT DNN trained on conventional features by up to 14% of relative WER reduction.
引用
收藏
页码:2882 / 2886
页数:5
相关论文
共 50 条
  • [31] STRUCTURED DISCRIMINATIVE MODELS USING DEEP NEURAL-NETWORK FEATURES
    van Dalen, R. C.
    Yang, J.
    Wang, H.
    Ragni, A.
    Zhang, C.
    Gales, M. J. F.
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 160 - 166
  • [32] FACTORIZED ADAPTATION FOR DEEP NEURAL NETWORK
    Li, Jinyu
    Huang, Jui-Ting
    Gong, Yifan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [33] Rapid Feature Space MLLR Speaker Adaptation for Deep Neural Network Acoustic Modeling
    Zhang, Shilei
    Qin, Yong
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2889 - 2894
  • [34] HOW TRANSFERABLE ARE FEATURES IN CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODELS ACROSS LANGUAGES?
    Thompson, Jessica A. F.
    Schoenwiesner, Marc
    Bengio, Yoshua
    Willett, Daniel
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2827 - 2831
  • [35] Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation
    Dean Luo
    Linzhong Xia
    Mingxiang Guan
    Mobile Networks and Applications, 2022, 27 : 1604 - 1611
  • [36] MULTI-TASK DEEP NEURAL NETWORK ACOUSTIC MODELS WITH MODEL ADAPTATION USING DISCRIMINATIVE SPEAKER IDENTITY FOR WHISPER RECOGNITION
    Li, Jingjie
    McLoughlin, Ian
    Liu, Cong
    Xue, Shaofei
    Wei, Si
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4969 - 4973
  • [37] Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation
    Luo, Dean
    Xia, Linzhong
    Guan, Mingxiang
    MOBILE NETWORKS & APPLICATIONS, 2022, 27 (04): : 1604 - 1611
  • [38] Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models
    Vu, Thuy-Trang
    Phung, Dinh
    Haffari, Gholamreza
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6163 - 6173
  • [39] STANDALONE TRAINING OF CONTEXT-DEPENDENT DEEP NEURAL NETWORK ACOUSTIC MODELS
    Zhang, C.
    Woodland, P. C.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] Complementary tasks for context-dependent deep neural network acoustic models
    Bell, Peter
    Renals, Steve
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3610 - 3614