Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

被引：0

作者：

Zhang, Yuhao ^{[1
]}

Xu, Chen ^{[1
]}

Li, Bei ^{[1
]}

Chen, Hao ^{[1
]}

Xiao, Tong ^{[1
,2
]}

Zhang, Chunliang ^{[1
,2
]}

Zhu, Jingbo ^{[1
,2
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

[2] NiuTrans Res, Shenyang, Peoples R China

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023) | 2023年

基金：

国家重点研发计划; 美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.

引用

页码：10753 / 10765

页数：13

共 50 条

[1] End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs
Kano, Takatomo
Sakti, Sakriani
Nakamura, Satoshi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1342 - 1355
[2] End-to-End Multi-Task Learning with Attention
Liu, Shikun
Johns, Edward
Davison, Andrew J.
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
[3] ASR Posterior-based Loss for Multi-task End-to-end Speech Translation
Ko, Yuka
Sudoh, Katsuhito
Sakti, Sakriani
Nakamura, Satoshi
INTERSPEECH 2021, 2021, : 2272 - 2276
[4] SPEECH ENHANCEMENT AIDED END-TO-END MULTI-TASK LEARNING FOR VOICE ACTIVITY DETECTION
Tan, Xu
Zhang, Xiao-Lei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6823 - 6827
[5] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
Yadavalli, Aditya
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
INTERSPEECH 2022, 2022, : 1387 - 1391
[6] Multi-task Learning with Attention for End-to-end Autonomous Driving
Ishihara, Keishi
Kanervisto, Anssi
Miura, Jun
Hautamaki, Ville
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2896 - 2905
[7] Adversarial Multi-task Learning for End-to-end Metaphor Detection
Zhang, Shenglong
Liu, Ying
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1483 - 1497
[8] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
Imaizumi, Ryo
Masumura, Ryo
Shiota, Sayaka
Kiya, Hitoshi
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
[9] ATTENTION-AUGMENTED END-TO-END MULTI-TASK LEARNING FOR EMOTION PREDICTION FROM SPEECH
Zhang, Zixing
Wu, Bingwen
Schuller, Bjoern
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6705 - 6709
[10] An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning
Gui, Lin
Du, Jiachen
Zhao, Zhishan
He, Yulan
Xu, Ruifeng
Fan, Chuang
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 288 - 298

← 1 2 3 4 5 →