IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION

被引：6

作者：

Li, Qiujia ^{[1
]}

Zhang, Yu ^{[2
]}

Qiu, David ^{[2
]}

He, Yanzhang ^{[2
]}

Cao, Liangliang ^{[2
]}

Woodland, Philip C. ^{[1
]}

机构：

[1] Univ Cambridge, Cambridge, England

[2] Google LLC, Mountain View, CA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

confidence scores; end-to-end; automatic speech recognition; out-of-domain;

D O I：

10.1109/ICASSP43922.2022.9746979

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection.

引用

页码：6537 / 6541

页数：5

共 50 条

[21] END-TO-END MULTIMODAL SPEECH RECOGNITION
Palaskar, Shruti
Sanabria, Ramon
Metze, Florian
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
[22] Overview of end-to-end speech recognition
Wang, Song
Li, Guanyu
2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
[23] End-to-end Accented Speech Recognition
Viglino, Thibault
Motlicek, Petr
Cernak, Milos
INTERSPEECH 2019, 2019, : 2140 - 2144
[24] Multichannel End-to-end Speech Recognition
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[25] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
Petridis, Stavros
Stafylakis, Themos
Ma, Pingchuan
Cai, Feipeng
Tzimiropoulos, Georgios
Pantic, Maja
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
[26] END-TO-END ANCHORED SPEECH RECOGNITION
Wang, Yiming
Fan, Xing
Chen, I-Fan
Liu, Yuzong
Chen, Tongfei
Hoffmeister, Bjorn
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
[27] Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data
Zhang, Yuhao
Xu, Chen
Hu, Bojie
Zhang, Chunliang
Xiao, Tong
Zhu, Jingbo
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13984 - 13992
[28] UTTERANCE-LEVEL NEURAL CONFIDENCE MEASURE FOR END-TO-END CHILDREN SPEECH RECOGNITION
Liu, Wei
Lee, Tan
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 449 - 456
[29] Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
Sadeq, Nafis
Chowdhury, Nafis Tahmid
Utshaw, Farhan Tanvir
Ahmed, Shafayat
Adnan, Muhammad Abdullah
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1875 - 1883
[30] FAST ENTROPY-BASED METHODS OF WORD-LEVEL CONFIDENCE ESTIMATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Laptev, Aleksandr
Ginsburg, Boris
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 152 - 159

← 1 2 3 4 5 →