GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models

被引：0

作者：

Tomashenko, Natalia ^{[1
,2
]}

Khokhlov, Yuri ^{[3
]}

机构：

[1] Speech Technol Ctr, St Petersburg, Russia

[2] ITMO Univ, St Petersburg, Russia

[3] STC Innovat Ltd, St Petersburg, Russia

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

speaker adaptation; deep neural networks (DNN); MAP; fMLLR; CD-DNN-HMM; GMM-derived (GMMD) features; speaker adaptive training (SAT);

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we investigate GMM-derived features recently introduced for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. We improve the previously proposed adaptation algorithm by applying the concept of speaker adaptive training (SAT) to DNNs built on GMM-derived features and by using fMLLR-adapted features for training an auxiliary GMM model. Traditional adaptation algorithms, such as maximum a posteriori adaptation (MAP) and feature space maximum likelihood linear regression (fMLLR) are performed for the auxiliary GMM model used in a SAT procedure for a DNN. Experimental results on the Wall Street Journal (WSJ0) corpus show that the proposed adaptation technique can provide, on average, a 17-28% relative word error rate (WER) reduction on different adaptation sets under an unsupervised adaptation setup, compared to speaker independent (SI) DNN-HMM systems built on conventional features. We found that fMLLR adaptation for the SAT DNN trained on GMM-derived features outperforms fMLLR adaptation for the SAT DNN trained on conventional features by up to 14% of relative WER reduction.

引用

页码：2882 / 2886

页数：5

共 50 条

[1] Exploring GMM-derived Features for Unsupervised Adaptation of Deep Neural Network Acoustic Models
Tomashenko, Natalia
Khokhlov, Yuri
Larcher, Anthony
Esteve, Yannick
SPEECH AND COMPUTER, 2016, 9811 : 304 - 311
[2] Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models
Samarakoon, Lahiru
Sim, Khe Chai
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1593 - 1597
[3] LEARNING HIDDEN UNIT CONTRIBUTIONS FOR UNSUPERVISED SPEAKER ADAPTATION OF NEURAL NETWORK ACOUSTIC MODELS
Swietojanski, Pawel
Renals, Steve
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 171 - 176
[4] Unsupervised Adaptation of Recurrent Neural Network Language Models
Gangireddy, Siva Reddy
Swietojanski, Pawel
Bell, Peter
Renals, Steve
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2333 - 2337
[5] DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR
Nathwani, Karan
Vincent, Emmanuel
Illina, Irina
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (03) : 338 - 342
[6] Context adaptive neural network for rapid adaptation of deep CNN based acoustic models
Delcroix, Marc
Kinoshita, Keisuke
Ogawa, Atsunori
Yoshioka, Takuya
Tran, Dung
Nakatani, Tomohiro
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1573 - 1577
[7] Adaptation and Contextualization of Deep Neural Network Models
Kollias, Dimitrios
Yu, Miao
Tagaris, Athanasios
Leontidis, Georgios
Kollias, Stefanos
Stafylopatis, Andreas
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1204 - 1211
[8] DISCRIMINATIVELY TRAINED JOINT SPEAKER AND ENVIRONMENT REPRESENTATIONS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS
Yin, Maofan
Sivadas, Sunil
Yu, Kai
Ma, Bin
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5065 - 5069
[9] Deep Neural Network Bottleneck Features for Acoustic Event Recognition
Mun, Seongkyu
Shon, Suwon
Kim, Wooil
Ko, Hanseok
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2954 - 2957
[10] Unsupervised Adaptation for Deep Neural Network using Linear Least Square Method
Hsiao, Roger
Ng, Tim
Tsakalidis, Stavros
Nguyen, Long
Schwartz, Richard
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2887 - 2891

← 1 2 3 4 5 →