IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION

被引:6
|
作者
Li, Qiujia [1 ]
Zhang, Yu [2 ]
Qiu, David [2 ]
He, Yanzhang [2 ]
Cao, Liangliang [2 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Google LLC, Mountain View, CA USA
关键词
confidence scores; end-to-end; automatic speech recognition; out-of-domain;
D O I
10.1109/ICASSP43922.2022.9746979
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection.
引用
收藏
页码:6537 / 6541
页数:5
相关论文
共 50 条
  • [1] BLSTM-BASED CONFIDENCE ESTIMATION FOR END-TO-END SPEECH RECOGNITION
    Ogawa, Atsunori
    Tawara, Naohiro
    Kano, Takatomo
    Delcroix, Marc
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6383 - 6387
  • [2] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [3] AN EVALUATION OF WORD-LEVEL CONFIDENCE ESTIMATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Oneata, Dan
    Caranica, Alexandru
    Stan, Adriana
    Cucu, Horia
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 258 - 265
  • [4] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
  • [5] Improving Children's Speech Recognition through Out-of-Domain Data Augmentation
    Fainberg, Joachim
    Bell, Peter
    Lincoln, Mike
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1598 - 1602
  • [6] INTERNAL LANGUAGE MODEL ESTIMATION FOR DOMAIN-ADAPTIVE END-TO-END SPEECH RECOGNITION
    Meng, Zhong
    Parthasarathy, Sarangarajan
    Sun, Eric
    Gaur, Yashesh
    Kanda, Naoyuki
    Lu, Liang
    Chen, Xie
    Zhao, Rui
    Li, Jinyu
    Gong, Yifan
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 243 - 250
  • [7] Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios
    Kumar, Ankur
    Singh, Sachin
    Gowda, Dhananjaya
    Garg, Abhinav
    Singh, Shatrughan
    Kim, Chanwoo
    INTERSPEECH 2020, 2020, : 4357 - 4361
  • [8] Confidence-based Ensembles of End-to-End Speech Recognition Models
    Gitman, Igor
    Lavrukhin, Vitaly
    Laptev, Aleksandr
    Ginsburg, Boris
    INTERSPEECH 2023, 2023, : 1414 - 1418
  • [9] Improving End-to-End Models for Children's Speech Recognition
    Patel, Tanvina
    Scharenborg, Odette
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [10] IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
    Takahashi, Naoya
    Singh, Mayank Kumar
    Basak, Sakya
    Sudarsanam, Parthasaarathy
    Ganapathy, Sriram
    Mitsufuji, Yuki
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 41 - 45