Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

被引:0
|
作者
Zhang, Yuhao [1 ]
Xu, Chen [1 ]
Li, Bei [1 ]
Chen, Hao [1 ]
Xiao, Tong [1 ,2 ]
Zhang, Chunliang [1 ,2 ]
Zhu, Jingbo [1 ,2 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[2] NiuTrans Res, Shenyang, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.
引用
收藏
页码:10753 / 10765
页数:13
相关论文
共 50 条
  • [41] End-to-end Multi-task Learning Framework for Spatio-Temporal Grounding in Video Corpus
    Gao, Yingqi
    Luo, Zhiling
    Chen, Shiqian
    Zhou, Wei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3958 - 3962
  • [42] Adversarial Multi-Task Learning for Robust End-to-End ECG-based Heartbeat Classification
    Shahin, Mostafa
    Oo, Ethan
    Ahmed, Beena
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 341 - 344
  • [43] MULTILINGUAL END-TO-END SPEECH TRANSLATION
    Inaguma, Hirofumi
    Duh, Kevin
    Kawahara, Tatsuya
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
  • [44] An effective multi-task learning model for end-to-end emotion-cause pair extraction
    Chenbing Li
    Jie Hu
    Tianrui Li
    Shengdong Du
    Fei Teng
    Applied Intelligence, 2023, 53 : 3519 - 3529
  • [45] An end-to-end multi-task learning to link framework for emotion-cause pair extraction
    Song, Haolin
    Song, Dawei
    2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2021, 12076
  • [46] An effective multi-task learning model for end-to-end emotion-cause pair extraction
    Li, Chenbing
    Hu, Jie
    Li, Tianrui
    Du, Shengdong
    Teng, Fei
    APPLIED INTELLIGENCE, 2023, 53 (03) : 3519 - 3529
  • [47] An Interactive Multi-Task Learning Network for End-to-End Aspect-Based Sentiment Analysis
    He, Ruidan
    Lee, Wee Sun
    Ng, Hwee Tou
    Dahlmeier, Daniel
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 504 - 515
  • [48] Neural multi-task learning for end-to-end Arabic aspect-based sentiment analysis
    Bensoltane, Rajae
    Zaki, Taher
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [49] End-to-end Delay Analysis for Coded Computing with Multi-Task Arrival
    Ji, Zhongming
    Chen, Li
    Chen, Xiaohui
    Wei, Guo
    2022 14TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING, WCSP, 2022, : 398 - 403
  • [50] SDAPNet: End-to-End Multi-task Simultaneous Detection and Prediction Network
    Ye, Shanding
    Yao, Han
    Wang, Wenfu
    Fu, Yongjian
    Pan, Zhijie
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,