Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

被引:0
|
作者
Pineiro-Martin, Andres [1 ,2 ]
Garcia-Mateo, Carmen [1 ]
Docio-Fernandez, Laura [1 ]
Del Carmen Lopez-Perez, Maria [2 ]
Rehm, Georg [3 ]
机构
[1] Univ Vigo, AtlanTTic Res Ctr, GTM Res Grp, Vigo, Spain
[2] Balidea Consulting & Programming SL, Santiago De Compostela, Spain
[3] DFKI GmbH, Speech & Language Technol Lab, Berlin, Germany
来源
关键词
Continual multilingual learning; automatic speech recognition; weighted cross-entropy; low-resource language; DATA AUGMENTATION;
D O I
10.21437/Interspeech.2024-734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenge of integrating low-resource languages into multilingual automatic speech recognition (ASR) systems. We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning. We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language, employing language-weighted dynamic cross-entropy and data augmentation. The results show a remarkable 6.69% word error rate (WER) reduction for the low-resource language compared to the fine-tuned model without applying our approach, and a 48.86% WER reduction compared to the original Whisper model. In addition, our approach yields an average WER reduction of 3.29% across the six languages, showing no degradation for the high-resource languages.
引用
收藏
页码:1235 / 1239
页数:5
相关论文
共 50 条
  • [21] Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
    Feng, Siyuan
    Tu, Ming
    Xia, Rui
    Huang, Chuanzeng
    Wang, Yuxuan
    INTERSPEECH 2023, 2023, : 1384 - 1388
  • [22] END-TO-END SPEECH RECOGNITION AND KEYWORD SEARCH ON LOW-RESOURCE LANGUAGES
    Rosenberg, Andrew
    Audhkhasi, Kartik
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Picheny, Michael
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5280 - 5284
  • [23] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Joyanta Basu
    Soma Khan
    Rajib Roy
    Tapan Kumar Basu
    Swanirbhar Majumder
    Circuits, Systems, and Signal Processing, 2021, 40 : 4986 - 5013
  • [24] SUPERVISED AND UNSUPERVISED ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGES
    Syed, Ali Raza
    Rosenberg, Andrew
    Kislal, Ellen
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5320 - 5324
  • [25] A Systematic Review and Analysis of Multilingual Data Strategies in Text-to-Speech for Low-Resource Languages
    Do, Phat
    Coler, Matt
    Dijkstra, Jelske
    Klabbers, Esther
    INTERSPEECH 2021, 2021, : 16 - 20
  • [26] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Basu, Joyanta
    Khan, Soma
    Roy, Rajib
    Basu, Tapan Kumar
    Majumder, Swanirbhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 4986 - 5013
  • [27] End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
    Li, Sheng
    Ding, Chenchen
    Lu, Xugang
    Shen, Peng
    Kawahara, Tatsuya
    Kawai, Hisashi
    INTERSPEECH 2019, 2019, : 2145 - 2149
  • [28] Task-based Meta Focal Loss for Multilingual Low-resource Speech Recognition
    Chen, Yaqi
    Zhang, Wenlin
    Zhang, Hao
    Qu, Dan
    Yang, Xu-Kui
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (11)
  • [29] TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages
    Fathima, Noor
    Patel, Tanvina
    Mahima, C.
    Iyengar, Anuroop
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3197 - 3201
  • [30] Progress in Multilingual Speech Recognition for Low Resource Languages Kurmanji Kurdish, Cree and Inuktut
    Gupta, Vishwa
    Boulianne, Gilles
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6420 - 6428