Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

被引:0
|
作者
Zhang, Yuhao [1 ]
Xu, Chen [1 ]
Li, Bei [1 ]
Chen, Hao [1 ]
Xiao, Tong [1 ,2 ]
Zhang, Chunliang [1 ,2 ]
Zhu, Jingbo [1 ,2 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[2] NiuTrans Res, Shenyang, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.
引用
收藏
页码:10753 / 10765
页数:13
相关论文
共 50 条
  • [1] End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1342 - 1355
  • [2] End-to-End Multi-Task Learning with Attention
    Liu, Shikun
    Johns, Edward
    Davison, Andrew J.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
  • [3] ASR Posterior-based Loss for Multi-task End-to-end Speech Translation
    Ko, Yuka
    Sudoh, Katsuhito
    Sakti, Sakriani
    Nakamura, Satoshi
    INTERSPEECH 2021, 2021, : 2272 - 2276
  • [4] SPEECH ENHANCEMENT AIDED END-TO-END MULTI-TASK LEARNING FOR VOICE ACTIVITY DETECTION
    Tan, Xu
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6823 - 6827
  • [5] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    INTERSPEECH 2022, 2022, : 1387 - 1391
  • [6] Multi-task Learning with Attention for End-to-end Autonomous Driving
    Ishihara, Keishi
    Kanervisto, Anssi
    Miura, Jun
    Hautamaki, Ville
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2896 - 2905
  • [7] Adversarial Multi-task Learning for End-to-end Metaphor Detection
    Zhang, Shenglong
    Liu, Ying
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1483 - 1497
  • [8] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [9] ATTENTION-AUGMENTED END-TO-END MULTI-TASK LEARNING FOR EMOTION PREDICTION FROM SPEECH
    Zhang, Zixing
    Wu, Bingwen
    Schuller, Bjoern
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6705 - 6709
  • [10] An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning
    Gui, Lin
    Du, Jiachen
    Zhao, Zhishan
    He, Yulan
    Xu, Ruifeng
    Fan, Chuang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 288 - 298