CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION

被引:24
|
作者
Li, Qiujia [1 ,3 ]
Qiu, David [2 ]
Zhang, Yu [2 ]
Li, Bo [2 ]
He, Yanzhang [2 ]
Woodland, Philip C. [1 ]
Cao, Liangliang [2 ]
Strohman, Trevor [2 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Google LLC, Mountain View, CA 94043 USA
[3] Google, Mountain View, CA 94043 USA
关键词
confidence scores; end-to-end ASR;
D O I
10.1109/ICASSP39728.2021.9414920
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning.
引用
收藏
页码:6388 / 6392
页数:5
相关论文
共 50 条
  • [21] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
    Irie, Kazuki
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Bruguier, Antoine
    Rybach, David
    Nguyen, Patrick
    INTERSPEECH 2019, 2019, : 3800 - 3804
  • [22] Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
    Karafiat, Martin
    Baskar, Murali Karthick
    Watanabe, Shinji
    Hori, Takaaki
    Wiesner, Matthew
    Cernocky, Jan Honza
    INTERSPEECH 2019, 2019, : 2220 - 2224
  • [23] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
    Shahamiri, Seyed Reza
    Lal, Vanshika
    Shah, Dhvani
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
  • [24] Double-attention mechanism of sequence-to-sequence deep neural networks for automatic speech recognition
    Yook, Dongsuk
    Lim, Dan
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 476 - 482
  • [25] FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4789 - 4793
  • [26] Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
    Zhou, Shiyu
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 791 - 795
  • [27] High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
    Thai-Son Nguyen
    Ngoc-Quan Pham
    Stueker, Sebastian
    Waibel, Alex
    INTERSPEECH 2020, 2020, : 2147 - 2151
  • [28] Attention-Based Sequence-to-Sequence Learning for Online Structural Response Forecasting Under Seismic Excitation
    Li, Teng
    Pan, Yuxin
    Tong, Kaitai
    Ventura, Carlos E.
    de Silva, Clarence W.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (04): : 2184 - 2200
  • [29] Efficient prediction uncertainty quantification in dam behavior monitoring with attention-based sequence-to-sequence learning
    Li, Minghao
    Ren, Qiubing
    Li, Mingchao
    Chen, Yun
    Ji, Xiaocui
    Liu, Hao
    APPLIED SOFT COMPUTING, 2024, 167
  • [30] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
    Baro, Arnau
    Badal, Carles
    Fornes, Alicia
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210