Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

被引:0
|
作者
Pineiro-Martin, Andres [1 ,2 ]
Garcia-Mateo, Carmen [1 ]
Docio-Fernandez, Laura [1 ]
Del Carmen Lopez-Perez, Maria [2 ]
Rehm, Georg [3 ]
机构
[1] Univ Vigo, AtlanTTic Res Ctr, GTM Res Grp, Vigo, Spain
[2] Balidea Consulting & Programming SL, Santiago De Compostela, Spain
[3] DFKI GmbH, Speech & Language Technol Lab, Berlin, Germany
来源
关键词
Continual multilingual learning; automatic speech recognition; weighted cross-entropy; low-resource language; DATA AUGMENTATION;
D O I
10.21437/Interspeech.2024-734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenge of integrating low-resource languages into multilingual automatic speech recognition (ASR) systems. We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning. We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language, employing language-weighted dynamic cross-entropy and data augmentation. The results show a remarkable 6.69% word error rate (WER) reduction for the low-resource language compared to the fine-tuned model without applying our approach, and a 48.86% WER reduction compared to the original Whisper model. In addition, our approach yields an average WER reduction of 3.29% across the six languages, showing no degradation for the high-resource languages.
引用
收藏
页码:1235 / 1239
页数:5
相关论文
共 50 条
  • [41] Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2581 - 2584
  • [42] Multilingual Data Selection For Low Resource Speech Recognition
    Thomas, Samuel
    Audhkhasi, Kartik
    Cui, Jia
    Kingsbury, Brian
    Ramabhadran, Bhuvana
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3853 - 3857
  • [43] OpenASR21: The Second Open Challenge for Automatic Speech Recognition of Low-Resource Languages
    Peterson, Kay
    Tong, Audrey
    Yu, Yan
    INTERSPEECH 2022, 2022, : 4895 - 4899
  • [44] OpenASR20: An open challenge for automatic speech recognition of conversational telephone speech in low-resource languages
    Peterson, Kay
    Tong, Audrey
    Yu, Yan
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 3 : 1570 - 1574
  • [45] MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION
    Li, Xinjian
    Mortensen, David R.
    Metze, Florian
    Black, Alan W.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6958 - 6962
  • [46] OpenASR20: An Open Challenge for Automatic Speech Recognition of Conversational Telephone Speech in Low-Resource Languages
    Peterson, Kay
    Tong, Audrey
    Yu, Yan
    INTERSPEECH 2021, 2021, : 4324 - 4328
  • [47] Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models
    Zhao, Jing
    Zhang, Wei-Qiang
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1227 - 1241
  • [48] Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition
    Zhang, Zhilong
    Wang, Wei
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 2248 - 2252
  • [49] Tackling Hate Speech in Low-resource Languages with Context Experts
    Nkemelu, Daniel
    Shah, Harshil
    Essa, Irfan
    Best, Michael L.
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES AND DEVELOPMENT, ICTD 2022, 2022,
  • [50] Multilingual speech recognition for GlobalPhone languages
    Tachbelie, Martha Yifiru
    Abate, Solomon Teferra
    Schultz, Tanja
    SPEECH COMMUNICATION, 2022, 140 : 71 - 86