ASR Posterior-based Loss for Multi-task End-to-end Speech Translation

被引:3
|
作者
Ko, Yuka [1 ]
Sudoh, Katsuhito [1 ,2 ]
Sakti, Sakriani [1 ,2 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol, Ikoma, Nara, Japan
[2] RIKEN Ctr Adv Intelligence Project AIP, Tokyo, Japan
来源
关键词
end-to-end speech translation; multi-task learning; spoken language translation;
D O I
10.21437/Interspeech.2021-1105
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end speech translation (ST) translates source language speech directly into target language without an intermediate automatic speech recognition (ASR) output, as in a cascading approach. End-to-end ST has the advantage of avoiding error propagation from the intermediate ASR results, but its performance still lags behind the cascading approach. A recent effort to increase performance is multi-task learning using an auxiliary task of ASR. However, previous multi-task learning for end-to-end ST using cross entropy (CE) loss in ASR-task targets one-hot references and does not consider ASR confusion. In this study, we propose a novel end-to-end ST training method using ASR loss against ASR posterior distributions given by a pre-trained model, which we call ASR posterior-based loss. The proposed method is expected to consider possible ASR confusion due to competing hypotheses with similar pronunciations. The proposed method demonstrated better BLEU results in our Fisher Spanish-to-English translation experiments than the baseline with standard CE loss with label smoothing.
引用
收藏
页码:2272 / 2276
页数:5
相关论文
共 50 条
  • [21] MULTILINGUAL END-TO-END SPEECH TRANSLATION
    Inaguma, Hirofumi
    Duh, Kevin
    Kawahara, Tatsuya
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
  • [22] End-to-end Delay Analysis for Coded Computing with Multi-Task Arrival
    Ji, Zhongming
    Chen, Li
    Chen, Xiaohui
    Wei, Guo
    2022 14TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING, WCSP, 2022, : 398 - 403
  • [23] An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning
    Gui, Lin
    Du, Jiachen
    Zhao, Zhishan
    He, Yulan
    Xu, Ruifeng
    Fan, Chuang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 288 - 298
  • [24] Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition
    Chen, Junjie
    Li, Yongwei
    Zhao, Ziping
    Liu, Xuefei
    Wen, Zhengqi
    Tao, Jianhua
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1966 - 1971
  • [25] SDAPNet: End-to-End Multi-task Simultaneous Detection and Prediction Network
    Ye, Shanding
    Yao, Han
    Wang, Wenfu
    Fu, Yongjian
    Pan, Zhijie
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [26] End-to-End Multi-Task Learning for Lung Nodule Segmentation and Diagnosis
    Chen, Wei
    Wang, Qiuli
    Yang, Dan
    Zhang, Xiaohong
    Liu, Chen
    Li, Yucong
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6710 - 6717
  • [27] MitosisNet: End-to-End Mitotic Cell Detection by Multi-Task Learning
    Alom, Md Zahangir
    Aspiras, Theus
    Taha, Tarek M.
    Bowen, T. J.
    Asari, Vijayan K.
    IEEE ACCESS, 2020, 8 : 68695 - 68710
  • [28] Multi-objective optimization based multi-task learning for end-to-end license plates recognition
    Zhou X.-J.
    Gao Y.
    Li C.-J.
    Yang C.-H.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2021, 38 (05): : 676 - 688
  • [29] End-to-end dialogue structure parsing on multi-floor dialogue based on multi-task learning
    Kawano, Seiya
    Yoshino, Koichiro
    Traum, David
    Nakamura, Satoshi
    FRONTIERS IN ROBOTICS AND AI, 2023, 10
  • [30] The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
    Xu, Chen
    Liu, Xiaoqian
    Liu, Xiaowen
    Wang, Laohu
    Huang, Canan
    Xiao, Tong
    Zhu, Jingbo
    IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 92 - 99