CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION

被引:24
|
作者
Li, Qiujia [1 ,3 ]
Qiu, David [2 ]
Zhang, Yu [2 ]
Li, Bo [2 ]
He, Yanzhang [2 ]
Woodland, Philip C. [1 ]
Cao, Liangliang [2 ]
Strohman, Trevor [2 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Google LLC, Mountain View, CA 94043 USA
[3] Google, Mountain View, CA 94043 USA
关键词
confidence scores; end-to-end ASR;
D O I
10.1109/ICASSP39728.2021.9414920
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning.
引用
收藏
页码:6388 / 6392
页数:5
相关论文
共 50 条
  • [41] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
    Li, Bo
    Sainath, Tara N.
    Sim, Khe Chai
    Bacchiani, Michiel
    Weinstein, Eugene
    Nguyen, Patrick
    Chen, Zhifeng
    Wu, Yonghui
    Rao, Kanishka
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753
  • [42] Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
    Hannun, Awni
    Lee, Ann
    Xu, Qiantong
    Collobert, Ronan
    INTERSPEECH 2019, 2019, : 3785 - 3789
  • [43] Lattice generation in attention-based speech recognition models
    Zapotoczny, Michal
    Pietrzak, Piotr
    Lancucki, Adrian
    Chorowski, Jan
    INTERSPEECH 2019, 2019, : 2225 - 2229
  • [44] LEVERAGING SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS FOR ENHANCING ACOUSTIC-TO-WORD SPEECH RECOGNITION
    Mimura, Masato
    Ueno, Sei
    Inaguma, Hirofumi
    Sakai, Shinsuke
    Kawahara, Tatsuya
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 477 - 484
  • [45] Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition
    Bai, Ye
    Yi, Jiangyan
    Tao, Jianhua
    Tian, Zhengkun
    Wen, Zhengqi
    INTERSPEECH 2019, 2019, : 3795 - 3799
  • [46] A Two-level Attention-based Sequence-to-Sequence Model for Accurate Inter-patient Arrhythmia Detection
    Jiang, Kun
    Liang, Shen
    Meng, Lingxiao
    Zhang, Yanchun
    Wang, Peng
    Wang, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1029 - 1033
  • [47] Sparse Sequence-to-Sequence Models
    Peters, Ben
    Niculae, Vlad
    Martins, Andre F. T.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1504 - 1519
  • [48] MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING
    Cho, Jaejin
    Baskar, Murali Karthick
    Li, Ruizhi
    Wiesner, Matthew
    Mallidi, Sri Harish
    Yalta, Nelson
    Karafiat, Martin
    Watanabe, Shinji
    Hori, Takaaki
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 521 - 527
  • [49] IMPROVING SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION TRAINING WITH ON-THE-FLY DATA AUGMENTATION
    Nguyen, Thai-Son
    Stuker, Sebastian
    Niehues, Jan
    Waibel, Alex
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7689 - 7693
  • [50] MITIGATING THE IMPACT OF SPEECH RECOGNITION ERRORS ON CHATBOT USING SEQUENCE-TO-SEQUENCE MODEL
    Chen, Pin-Jung
    Hsu, I-Hung
    Huang, Yi-Yao
    Lee, Hung-Yi
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 497 - 503