GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models

被引:0
|
作者
Tomashenko, Natalia [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
关键词
speaker adaptation; deep neural networks (DNN); MAP; fMLLR; CD-DNN-HMM; GMM-derived (GMMD) features; speaker adaptive training (SAT);
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we investigate GMM-derived features recently introduced for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. We improve the previously proposed adaptation algorithm by applying the concept of speaker adaptive training (SAT) to DNNs built on GMM-derived features and by using fMLLR-adapted features for training an auxiliary GMM model. Traditional adaptation algorithms, such as maximum a posteriori adaptation (MAP) and feature space maximum likelihood linear regression (fMLLR) are performed for the auxiliary GMM model used in a SAT procedure for a DNN. Experimental results on the Wall Street Journal (WSJ0) corpus show that the proposed adaptation technique can provide, on average, a 17-28% relative word error rate (WER) reduction on different adaptation sets under an unsupervised adaptation setup, compared to speaker independent (SI) DNN-HMM systems built on conventional features. We found that fMLLR adaptation for the SAT DNN trained on GMM-derived features outperforms fMLLR adaptation for the SAT DNN trained on conventional features by up to 14% of relative WER reduction.
引用
收藏
页码:2882 / 2886
页数:5
相关论文
共 50 条
  • [1] Exploring GMM-derived Features for Unsupervised Adaptation of Deep Neural Network Acoustic Models
    Tomashenko, Natalia
    Khokhlov, Yuri
    Larcher, Anthony
    Esteve, Yannick
    SPEECH AND COMPUTER, 2016, 9811 : 304 - 311
  • [2] Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models
    Samarakoon, Lahiru
    Sim, Khe Chai
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1593 - 1597
  • [3] LEARNING HIDDEN UNIT CONTRIBUTIONS FOR UNSUPERVISED SPEAKER ADAPTATION OF NEURAL NETWORK ACOUSTIC MODELS
    Swietojanski, Pawel
    Renals, Steve
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 171 - 176
  • [4] Unsupervised Adaptation of Recurrent Neural Network Language Models
    Gangireddy, Siva Reddy
    Swietojanski, Pawel
    Bell, Peter
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2333 - 2337
  • [5] DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR
    Nathwani, Karan
    Vincent, Emmanuel
    Illina, Irina
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (03) : 338 - 342
  • [6] Context adaptive neural network for rapid adaptation of deep CNN based acoustic models
    Delcroix, Marc
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Yoshioka, Takuya
    Tran, Dung
    Nakatani, Tomohiro
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1573 - 1577
  • [7] Adaptation and Contextualization of Deep Neural Network Models
    Kollias, Dimitrios
    Yu, Miao
    Tagaris, Athanasios
    Leontidis, Georgios
    Kollias, Stefanos
    Stafylopatis, Andreas
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1204 - 1211
  • [8] DISCRIMINATIVELY TRAINED JOINT SPEAKER AND ENVIRONMENT REPRESENTATIONS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS
    Yin, Maofan
    Sivadas, Sunil
    Yu, Kai
    Ma, Bin
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5065 - 5069
  • [9] Deep Neural Network Bottleneck Features for Acoustic Event Recognition
    Mun, Seongkyu
    Shon, Suwon
    Kim, Wooil
    Ko, Hanseok
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2954 - 2957
  • [10] Unsupervised Adaptation for Deep Neural Network using Linear Least Square Method
    Hsiao, Roger
    Ng, Tim
    Tsakalidis, Stavros
    Nguyen, Long
    Schwartz, Richard
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2887 - 2891