IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION

被引：6

作者：

Li, Qiujia ^{[1
]}

Zhang, Yu ^{[2
]}

Qiu, David ^{[2
]}

He, Yanzhang ^{[2
]}

Cao, Liangliang ^{[2
]}

Woodland, Philip C. ^{[1
]}

机构：

[1] Univ Cambridge, Cambridge, England

[2] Google LLC, Mountain View, CA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

confidence scores; end-to-end; automatic speech recognition; out-of-domain;

D O I：

10.1109/ICASSP43922.2022.9746979

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection.

引用

页码：6537 / 6541

页数：5

共 50 条

[31] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
Kim, Chanwoo
Kim, Sungsoo
Kim, Kwangyoun
Kumar, Mehul
Kim, Jiyeon
Lee, Kyungmin
Han, Changwoo
Garg, Abhinav
Kim, Eunhyang
Shin, Minkyoo
Singh, Shatrughan
Heck, Larry
Gowda, Dhananjaya
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
[32] DATA AUGMENTATION FOR END-TO-END CODE-SWITCHING SPEECH RECOGNITION
Du, Chenpeng
Li, Hao
Lu, Yizhou
Wang, Lan
Qian, Yanmin
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 194 - 200
[33] Simple Data Augmented Transformer End-To-End Tibetan Speech Recognition
Yang, Xiaodong
Wang, Weizhe
Yang, Hongwu
Jiang, Jiaolong
2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 148 - 152
[34] SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition
Song, Xingchen
Wu, Zhiyong
Huang, Yiheng
Su, Dan
Meng, Helen
INTERSPEECH 2020, 2020, : 581 - 585
[35] DOMAIN ADAPTATION OF END-TO-END SPEECH RECOGNITION IN LOW-RESOURCE SETTINGS
Samarakoon, Lahiru
Mak, Brian
Lam, Albert Y. S.
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 382 - 388
[36] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
Wang, Changhan
Pino, Juan
Gu, Jiatao
INTERSPEECH 2020, 2020, : 4731 - 4735
[37] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
Tian, Zhengkun
Yi, Jiangyan
Bai, Ye
Tao, Jianhua
Zhang, Shuai
Wen, Zhengqi
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
[38] End-to-End Speech Recognition For Arabic Dialects
Seham Nasr
Rehab Duwairi
Muhannad Quwaider
Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
[39] End-to-End Speech Recognition of Tamil Language
Changrampadi, Mohamed Hashim
Shahina, A.
Narayanan, M. Badri
Khan, A. Nayeemulla
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
[40] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
Braun, Stefan
Liu, Shih-Chii
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640

← 1 2 3 4 5 →