CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION

被引:24
|
作者
Li, Qiujia [1 ,3 ]
Qiu, David [2 ]
Zhang, Yu [2 ]
Li, Bo [2 ]
He, Yanzhang [2 ]
Woodland, Philip C. [1 ]
Cao, Liangliang [2 ]
Strohman, Trevor [2 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Google LLC, Mountain View, CA 94043 USA
[3] Google, Mountain View, CA 94043 USA
关键词
confidence scores; end-to-end ASR;
D O I
10.1109/ICASSP39728.2021.9414920
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning.
引用
收藏
页码:6388 / 6392
页数:5
相关论文
共 50 条
  • [31] CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL
    Hrinchuk, Oleksii
    Popova, Mariya
    Ginsburg, Boris
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7074 - 7078
  • [32] ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
    Palaskar, Shruti
    Metze, Florian
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 397 - 404
  • [33] Sequence-to-Sequence Models Can Directly Translate Foreign Speech
    Weiss, Ron J.
    Chorowski, Jan
    Jaitly, Navdeep
    Wu, Yonghui
    Chen, Zhifeng
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2625 - 2629
  • [34] SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODEL FOR SPEECH RECOGNITION
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5884 - 5888
  • [35] Detection and analysis of attention errors in sequence-to-sequence text-to-speech
    Valentini-Botinhao, Cassia
    King, Simon
    INTERSPEECH 2021, 2021, : 2746 - 2750
  • [36] Guiding Attention in Sequence-to-Sequence Models for Dialogue Act prediction
    Colombo, Pierre
    Chapuis, Emile
    Manica, Matteo
    Vignon, Emmanuel
    Varni, Giovanna
    Clavel, Chloe
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7594 - 7601
  • [37] Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model
    Hoesen, Devin
    Putri, Fanda Yuliana
    Lestari, Dessi Puji
    2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 7 - 12
  • [38] Dual Attention-Based Encoder-Decoder: A Customized Sequence-to-Sequence Learning for Soft Sensor Development
    Feng, Liangjun
    Zhao, Chunhui
    Sun, Youxian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3306 - 3317
  • [39] Enhanced Sequence-to-Sequence Attention-Based PM2.5 Concentration Forecasting Using Spatiotemporal Data
    Kim, Baekcheon
    Kim, Eunkyeong
    Jung, Seunghwan
    Kim, Minseok
    Kim, Jinyong
    Kim, Sungshin
    ATMOSPHERE, 2024, 15 (12)
  • [40] Towards Understanding Attention-Based Speech Recognition Models
    Qin, Chu-Xiong
    Qu, Dan
    IEEE ACCESS, 2020, 8 : 24358 - 24369