Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

被引:0
|
作者
Zhang, Yuhao [1 ]
Xu, Chen [1 ]
Li, Bei [1 ]
Chen, Hao [1 ]
Xiao, Tong [1 ,2 ]
Zhang, Chunliang [1 ,2 ]
Zhu, Jingbo [1 ,2 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[2] NiuTrans Res, Shenyang, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.
引用
收藏
页码:10753 / 10765
页数:13
相关论文
共 50 条
  • [21] An End-to-End Multi-Task Deep Learning Framework for Skin Lesion Analysis
    Song, Lei
    Lin, Jianzhe
    Wang, Z. Jane
    Wang, Haoqian
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (10) : 2912 - 2921
  • [22] Naranjo Question Answering using End-to-End Multi-task Learning Model
    Rawat, Bhanu Pratap Singh
    Li, Fei
    Yu, Hong
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2547 - 2555
  • [23] End-to-end Argument Mining with Cross-corpora Multi-task Learning
    Morio, Gaku
    Ozaki, Hiroaki
    Morishita, Terufumi
    Yanai, Kohsuke
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 639 - 658
  • [24] End-to-End Multi-task Learning for Allusion Detection in Ancient Chinese Poems
    Liu, Lei
    Chen, Xiaoyang
    He, Ben
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT II, 2020, 12275 : 300 - 311
  • [25] Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
    Rumberg, Lars
    Ehlert, Hanna
    Luedtke, Ulrike
    Ostermann, Joern
    INTERSPEECH 2021, 2021, : 3850 - 3854
  • [26] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 4731 - 4735
  • [27] Knowledge Distillation on Joint Task End-to-End Speech Translation
    Nayem, Khandokar Md
    Xue, Ran
    Chang, Ching-Yun
    Shanbhogue, Akshaya Vishnu Kudlu
    INTERSPEECH 2023, 2023, : 1493 - 1497
  • [28] MINTZAI: End-to-end Deep Learning for Speech Translation
    Etchegoyhen, Thierry
    Arzelus, Haritz
    Gete, Harritxu
    Alvarez, Aitor
    Hernaez, Inma
    Navas, Eva
    Gonzalez-Docasal, Ander
    Osacar, Jaime
    Benites, Edson
    Ellakuria, Igor
    Calonge, Eusebi
    Martin, Maite
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
  • [29] Improving End-to-End Speech Translation with Progressive Dual Encoding
    Zhang, Runlai
    Chen, Saihan
    Zhang, Yuhao
    Du, Yangfan
    Chen, Hao
    Xiao, Tong
    Zhu, Jingbo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 199 - 212
  • [30] End-to-End Multi-task Learning Regression Network for Fovea Localization in Fundus Images
    Huang, Limin
    Lei, Haijun
    Liu, Weixin
    Li, Zhen
    Xie, Hai
    Lei, Baiying
    2022 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2022, : 389 - 393