Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition

被引：11

作者：

Dan, Zhengjia ^{[1
]}

Zhao, Yue ^{[1
]}

Bi, Xiaojun ^{[1
]}

Wu, Licheng ^{[1
]}

Ji, Qiang ^{[2
]}

机构：

[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China

[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA

来源：

ENTROPY | 2022年 / 24卷 / 10期

基金：

中国国家自然科学基金;

关键词：

adaptive cross-entropy loss; multi-task Transformer; multi-dialect speech recognition; DEEP NEURAL-NETWORKS;

D O I：

10.3390/e24101429

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

At present, most multi-dialect speech recognition models are based on a hard-parameter-sharing multi-task structure, which makes it difficult to reveal how one task contributes to others. In addition, in order to balance multi-task learning, the weights of the multi-task objective function need to be manually adjusted. This makes multi-task learning very difficult and costly because it requires constantly trying various combinations of weights to determine the optimal task weights. In this paper, we propose a multi-dialect acoustic model that combines soft-parameter-sharing multi-task learning with Transformer, and introduce several auxiliary cross-attentions to enable the auxiliary task (dialect ID recognition) to provide dialect information for the multi-dialect speech recognition task. Furthermore, we use the adaptive cross-entropy loss function as the multi-task objective function, which automatically balances the learning of the multi-task model according to the loss proportion of each task during the training process. Therefore, the optimal weight combination can be found without any manual intervention. Finally, for the two tasks of multi-dialect (including low-resource dialect) speech recognition and dialect ID recognition, the experimental results show that, compared with single-dialect Transformer, single-task multi-dialect Transformer, and multi-task Transformer with hard parameter sharing, our method significantly reduces the average syllable error rate of Tibetan multi-dialect speech recognition and the character error rate of Chinese multi-dialect speech recognition.

引用

页数：12

共 50 条

[31] The Building and Evaluation of a Mobile Parallel Multi-Dialect Speech Corpus for Arabic
Almeman, Khalid
ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 166 - 173
[32] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
Ghosh, Sreyan
Tyagi, Utkarsh
Ramaneswaran, S.
Srivastava, Harshvardhan
Manocha, Dinesh
INTERSPEECH 2023, 2023, : 1209 - 1213
[33] Multi-task Recurrent Model for True Multilingual Speech Recognition
Tang, Zhiyuan
Li, Lantian
Wang, Dong
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[34] MULTI-TASK AUTOENCODER FOR NOISE-ROBUST SPEECH RECOGNITION
Zhang, Haoyi
Liu, Conggui
Inoue, Nakamasa
Shinoda, Koichi
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5599 - 5603
[35] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
Yue, Pengcheng
Qu, Leyuan
Zheng, Shukai
Li, Taihao
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
[36] Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition
Park, Sunchan
Kim, Hyung Soon
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 515 - 522
[37] MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning
Xu, Xiaogang
Zhao, Hengshuang
Vineet, Vibhav
Lim, Ser-Nam
Torralba, Antonio
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 304 - 321
[38] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
Hsu, Jia-Hao
Wu, Chung-Hsien
Wei, Yu-Hung
INTERSPEECH 2023, 2023, : 4553 - 4557
[39] Unified Transformer Multi-Task Learning for Intent Classification With Entity Recognition
Benayas Alamos, Alberto Jose
Hashempou, Reyhaneh
Rumble, Damian
Jameel, Shoaib
De Amorim, Renato Cordeiro
IEEE ACCESS, 2021, 9 : 147306 - 147314
[40] A multi-task minutiae transformer network for fingerprint recognition of young children
Liu, Manhua
Liu, Aitong
Shi, Yelin
Liu, Shuxin
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273

← 1 2 3 4 5 →