CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION

被引：24

作者：

Li, Qiujia ^{[1
,3
]}

Qiu, David ^{[2
]}

Zhang, Yu ^{[2
]}

Li, Bo ^{[2
]}

He, Yanzhang ^{[2
]}

Woodland, Philip C. ^{[1
]}

Cao, Liangliang ^{[2
]}

Strohman, Trevor ^{[2
]}

机构：

[1] Univ Cambridge, Cambridge, England

[2] Google LLC, Mountain View, CA 94043 USA

[3] Google, Mountain View, CA 94043 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

confidence scores; end-to-end ASR;

D O I：

10.1109/ICASSP39728.2021.9414920

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning.

引用

页码：6388 / 6392

页数：5

共 50 条

[21] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Irie, Kazuki
Prabhavalkar, Rohit
Kannan, Anjuli
Bruguier, Antoine
Rybach, David
Nguyen, Patrick
INTERSPEECH 2019, 2019, : 3800 - 3804
[22] Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
Karafiat, Martin
Baskar, Murali Karthick
Watanabe, Shinji
Hori, Takaaki
Wiesner, Matthew
Cernocky, Jan Honza
INTERSPEECH 2019, 2019, : 2220 - 2224
[23] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
Shahamiri, Seyed Reza
Lal, Vanshika
Shah, Dhvani
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
[24] Double-attention mechanism of sequence-to-sequence deep neural networks for automatic speech recognition
Yook, Dongsuk
Lim, Dan
Yoo, In-Chul
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 476 - 482
[25] FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS
Zhang, Jing-Xuan
Ling, Zhen-Hua
Dai, Li-Rong
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4789 - 4793
[26] Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Zhou, Shiyu
Dong, Linhao
Xu, Shuang
Xu, Bo
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 791 - 795
[27] High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
Thai-Son Nguyen
Ngoc-Quan Pham
Stueker, Sebastian
Waibel, Alex
INTERSPEECH 2020, 2020, : 2147 - 2151
[28] Attention-Based Sequence-to-Sequence Learning for Online Structural Response Forecasting Under Seismic Excitation
Li, Teng
Pan, Yuxin
Tong, Kaitai
Ventura, Carlos E.
de Silva, Clarence W.
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (04): : 2184 - 2200
[29] Efficient prediction uncertainty quantification in dam behavior monitoring with attention-based sequence-to-sequence learning
Li, Minghao
Ren, Qiubing
Li, Mingchao
Chen, Yun
Ji, Xiaocui
Liu, Hao
APPLIED SOFT COMPUTING, 2024, 167
[30] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
Baro, Arnau
Badal, Carles
Fornes, Alicia
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210

← 1 2 3 4 5 →