Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition

被引:11
|
作者
Dan, Zhengjia [1 ]
Zhao, Yue [1 ]
Bi, Xiaojun [1 ]
Wu, Licheng [1 ]
Ji, Qiang [2 ]
机构
[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
基金
中国国家自然科学基金;
关键词
adaptive cross-entropy loss; multi-task Transformer; multi-dialect speech recognition; DEEP NEURAL-NETWORKS;
D O I
10.3390/e24101429
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
At present, most multi-dialect speech recognition models are based on a hard-parameter-sharing multi-task structure, which makes it difficult to reveal how one task contributes to others. In addition, in order to balance multi-task learning, the weights of the multi-task objective function need to be manually adjusted. This makes multi-task learning very difficult and costly because it requires constantly trying various combinations of weights to determine the optimal task weights. In this paper, we propose a multi-dialect acoustic model that combines soft-parameter-sharing multi-task learning with Transformer, and introduce several auxiliary cross-attentions to enable the auxiliary task (dialect ID recognition) to provide dialect information for the multi-dialect speech recognition task. Furthermore, we use the adaptive cross-entropy loss function as the multi-task objective function, which automatically balances the learning of the multi-task model according to the loss proportion of each task during the training process. Therefore, the optimal weight combination can be found without any manual intervention. Finally, for the two tasks of multi-dialect (including low-resource dialect) speech recognition and dialect ID recognition, the experimental results show that, compared with single-dialect Transformer, single-task multi-dialect Transformer, and multi-task Transformer with hard parameter sharing, our method significantly reduces the average syllable error rate of Tibetan multi-dialect speech recognition and the character error rate of Chinese multi-dialect speech recognition.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
    Fu, Hongliang
    Zhuang, Zhihao
    Wang, Yang
    Huang, Chen
    Duan, Wenzhuo
    ENTROPY, 2023, 25 (01)
  • [42] MULTI-TASK LANGUAGE MODELING FOR IMPROVING SPEECH RECOGNITION OF RARE WORDS
    Yang, Chao-Han Huck
    Liu, Linda
    Gandhe, Ankur
    Gu, Yile
    Raju, Anirudh
    Filimonov, Denis
    Bulyko, Ivan
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1087 - 1093
  • [43] Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model
    Guo, Ying
    Wang, Li
    IAENG International Journal of Computer Science, 2025, 52 (01) : 23 - 31
  • [44] Adaptive Weight Generator for Multi-Task Image Recognition by Task Grouping Prompt
    Wu, Gaojie
    Zeng, Ling-an
    Meng, Jing-Ke
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9906 - 9919
  • [45] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    INTERSPEECH 2022, 2022, : 1158 - 1162
  • [46] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
  • [47] Cross-lingual adaptation with multi-task adaptive networks
    Bell, Peter
    Driesen, Joris
    Renals, Steve
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 21 - 25
  • [48] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [49] JLMS25 and Jiao-Liao Mandarin Speech Recognition Based on Multi-Dialect Knowledge Transfer
    Li, Xuchen
    Wang, Yiqun
    Liu, Xiaoyang
    Su, Kun
    Li, Zhaochen
    Wang, Yitian
    Jiang, Bin
    Xie, Kang
    Liu, Jie
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [50] Global Cross-Entropy Loss for Deep Face Recognition
    Zhao, Weisong
    Zhu, Xiangyu
    Shi, Haichao
    Zhang, Xiao-Yu
    Zhao, Guoying
    Lei, Zhen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1672 - 1685