ASR Posterior-based Loss for Multi-task End-to-end Speech Translation

被引:3
|
作者
Ko, Yuka [1 ]
Sudoh, Katsuhito [1 ,2 ]
Sakti, Sakriani [1 ,2 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol, Ikoma, Nara, Japan
[2] RIKEN Ctr Adv Intelligence Project AIP, Tokyo, Japan
来源
关键词
end-to-end speech translation; multi-task learning; spoken language translation;
D O I
10.21437/Interspeech.2021-1105
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end speech translation (ST) translates source language speech directly into target language without an intermediate automatic speech recognition (ASR) output, as in a cascading approach. End-to-end ST has the advantage of avoiding error propagation from the intermediate ASR results, but its performance still lags behind the cascading approach. A recent effort to increase performance is multi-task learning using an auxiliary task of ASR. However, previous multi-task learning for end-to-end ST using cross entropy (CE) loss in ASR-task targets one-hot references and does not consider ASR confusion. In this study, we propose a novel end-to-end ST training method using ASR loss against ASR posterior distributions given by a pre-trained model, which we call ASR posterior-based loss. The proposed method is expected to consider possible ASR confusion due to competing hypotheses with similar pronunciations. The proposed method demonstrated better BLEU results in our Fisher Spanish-to-English translation experiments than the baseline with standard CE loss with label smoothing.
引用
收藏
页码:2272 / 2276
页数:5
相关论文
共 50 条
  • [41] An end-to-end multi-task deep learning framework for bronchoscopy image classification
    Setayeshi, Rojin
    Vahidi, Javad
    Kozegar, Ehsan
    Tan, Tao
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [42] An End-to-End Multi-Task Deep Learning Framework for Skin Lesion Analysis
    Song, Lei
    Lin, Jianzhe
    Wang, Z. Jane
    Wang, Haoqian
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (10) : 2912 - 2921
  • [43] Naranjo Question Answering using End-to-End Multi-task Learning Model
    Rawat, Bhanu Pratap Singh
    Li, Fei
    Yu, Hong
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2547 - 2555
  • [44] End-to-end Argument Mining with Cross-corpora Multi-task Learning
    Morio, Gaku
    Ozaki, Hiroaki
    Morishita, Terufumi
    Yanai, Kohsuke
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 639 - 658
  • [45] End-to-End Multi-task Learning for Allusion Detection in Ancient Chinese Poems
    Liu, Lei
    Chen, Xiaoyang
    He, Ben
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT II, 2020, 12275 : 300 - 311
  • [46] Multi-task multi-resolution char-to-BPE cross-attention decoder for end-to-end speech recognition
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Chanwoo
    INTERSPEECH 2019, 2019, : 2783 - 2787
  • [47] End-to-End Speech Translation with Adversarial Training
    Li, Xuancai
    Chen, Kehai
    Zhao, Tiejun
    Yang, Muyun
    WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
  • [48] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
    Berard, Alexandre
    Besacier, Laurent
    Kocabiyikoglu, Ali Can
    Pietquin, Olivier
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228
  • [49] End-to-End Speech Translation with Knowledge Distillation
    Liu, Yuchen
    Xiong, Hao
    Zhang, Jiajun
    He, Zhongjun
    Wu, Hua
    Wang, Haifeng
    Zong, Chengqing
    INTERSPEECH 2019, 2019, : 1128 - 1132
  • [50] Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matrices
    Zhao, Xiaohu
    Sun, Haoran
    Lei, Yikun
    Xiong, Deyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247