DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION

被引:0
|
作者
Zhao, Yong [1 ]
Li, Jinyu [1 ]
Zhang, Shixiong [1 ]
Chen, Liping [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA
关键词
deep neural network; domain adaptation; speaker adaptation; anchor embedding;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice assistant represents one of the most popular and important scenarios for speech recognition. In this paper, we propose two adaptation approaches to customize a multi-style well-trained acoustic model towards its subsidiary domain of Cortana assistant. First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of 'Hey Cortana'. The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers. Second, we directly update the existing model parameters for domain adaptation. We demonstrate that prior distribution should be updated along with the network adaptation to compensate the label bias from the development data. Updating the priors may have a significant impact when the target domain features high occurrence of anchor words. Experiments on Hey Cortana desktop test set show that both approaches improve the recognition accuracy significantly. The anchor-based adaptation using the anchor d-vector and the prior interpolation achieves 32% relative reduction in WER over the generic model.
引用
收藏
页码:5984 / 5988
页数:5
相关论文
共 50 条
  • [31] Speaker adaptation of fuzzy-perceptron-based speech recognition
    Dept. of Elec. and Contr. Eng., National Chiao-Tung University, Hsinchu, Taiwan
    Int. J. Uncertainty Fuzziness Knowledge Based Syst., 1 (1-30):
  • [32] Experiments in speaker normalisation and adaptation for large vocabulary speech recognition
    Pye, D
    Woodland, PC
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1047 - 1050
  • [33] Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
    Deng, Jiajun
    Xie, Xurong
    Wang, Tianzi
    Cui, Mingyu
    Xue, Boyang
    Jin, Zengrui
    Geng, Mengzhe
    Li, Guinan
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2022, 2022, : 2623 - 2627
  • [34] INVESTIGATIONS ON SPEAKER ADAPTATION OF LSTM RNN MODELS FOR SPEECH RECOGNITION
    Liu, Chaojun
    Wang, Yongqiang
    Kumar, Kshitiz
    Gong, Yifan
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5020 - 5024
  • [35] DYNAMIC FREQUENCY WARPING FOR SPEAKER ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
    PALIWAL, KK
    AINSWORTH, WA
    JOURNAL OF PHONETICS, 1985, 13 (02) : 123 - 134
  • [36] SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    Hori, Takaaki
    Hershey, John
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6707 - 6711
  • [37] FLC-Regulated Speaker Adaptation Mechanisms for Speech Recognition
    Ding, Ing-Jr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT II, 2010, 6422 : 288 - 297
  • [38] Combination of Acoustic and Lexical Speaker Adaptation for Disordered Speech Recognition
    Saz, Oscar
    Lleida, Eduardo
    Miguel, Antonio
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 540 - 543
  • [39] Speaker adaptation techniques for speech recognition using probabilistic models
    Shinoda, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2005, 88 (12): : 25 - 42
  • [40] Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 701 - 704