DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION

被引：0

作者：

Zhao, Yong ^{[1
]}

Li, Jinyu ^{[1
]}

Zhang, Shixiong ^{[1
]}

Chen, Liping ^{[1
]}

Gong, Yifan ^{[1
]}

机构：

[1] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

deep neural network; domain adaptation; speaker adaptation; anchor embedding;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voice assistant represents one of the most popular and important scenarios for speech recognition. In this paper, we propose two adaptation approaches to customize a multi-style well-trained acoustic model towards its subsidiary domain of Cortana assistant. First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of 'Hey Cortana'. The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers. Second, we directly update the existing model parameters for domain adaptation. We demonstrate that prior distribution should be updated along with the network adaptation to compensate the label bias from the development data. Updating the priors may have a significant impact when the target domain features high occurrence of anchor words. Experiments on Hey Cortana desktop test set show that both approaches improve the recognition accuracy significantly. The anchor-based adaptation using the anchor d-vector and the prior interpolation achieves 32% relative reduction in WER over the generic model.

引用

页码：5984 / 5988

页数：5

共 50 条

[31] Speaker adaptation of fuzzy-perceptron-based speech recognition
Dept. of Elec. and Contr. Eng., National Chiao-Tung University, Hsinchu, Taiwan
Int. J. Uncertainty Fuzziness Knowledge Based Syst., 1 (1-30):
[32] Experiments in speaker normalisation and adaptation for large vocabulary speech recognition
Pye, D
Woodland, PC
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1047 - 1050
[33] Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Deng, Jiajun
Xie, Xurong
Wang, Tianzi
Cui, Mingyu
Xue, Boyang
Jin, Zengrui
Geng, Mengzhe
Li, Guinan
Liu, Xunying
Meng, Helen
INTERSPEECH 2022, 2022, : 2623 - 2627
[34] INVESTIGATIONS ON SPEAKER ADAPTATION OF LSTM RNN MODELS FOR SPEECH RECOGNITION
Liu, Chaojun
Wang, Yongqiang
Kumar, Kshitiz
Gong, Yifan
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5020 - 5024
[35] DYNAMIC FREQUENCY WARPING FOR SPEAKER ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
PALIWAL, KK
AINSWORTH, WA
JOURNAL OF PHONETICS, 1985, 13 (02) : 123 - 134
[36] SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION
Ochiai, Tsubasa
Watanabe, Shinji
Katagiri, Shigeru
Hori, Takaaki
Hershey, John
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6707 - 6711
[37] FLC-Regulated Speaker Adaptation Mechanisms for Speech Recognition
Ding, Ing-Jr
COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT II, 2010, 6422 : 288 - 297
[38] Combination of Acoustic and Lexical Speaker Adaptation for Disordered Speech Recognition
Saz, Oscar
Lleida, Eduardo
Miguel, Antonio
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 540 - 543
[39] Speaker adaptation techniques for speech recognition using probabilistic models
Shinoda, K
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2005, 88 (12): : 25 - 42
[40] Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
Padmanabhan, M
Bahl, LR
Nahamoo, D
Picheny, MA
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 701 - 704

← 1 2 3 4 5 →