ASR Posterior-based Loss for Multi-task End-to-end Speech Translation

被引:3
|
作者
Ko, Yuka [1 ]
Sudoh, Katsuhito [1 ,2 ]
Sakti, Sakriani [1 ,2 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol, Ikoma, Nara, Japan
[2] RIKEN Ctr Adv Intelligence Project AIP, Tokyo, Japan
来源
关键词
end-to-end speech translation; multi-task learning; spoken language translation;
D O I
10.21437/Interspeech.2021-1105
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end speech translation (ST) translates source language speech directly into target language without an intermediate automatic speech recognition (ASR) output, as in a cascading approach. End-to-end ST has the advantage of avoiding error propagation from the intermediate ASR results, but its performance still lags behind the cascading approach. A recent effort to increase performance is multi-task learning using an auxiliary task of ASR. However, previous multi-task learning for end-to-end ST using cross entropy (CE) loss in ASR-task targets one-hot references and does not consider ASR confusion. In this study, we propose a novel end-to-end ST training method using ASR loss against ASR posterior distributions given by a pre-trained model, which we call ASR posterior-based loss. The proposed method is expected to consider possible ASR confusion due to competing hypotheses with similar pronunciations. The proposed method demonstrated better BLEU results in our Fisher Spanish-to-English translation experiments than the baseline with standard CE loss with label smoothing.
引用
收藏
页码:2272 / 2276
页数:5
相关论文
共 50 条
  • [31] SRPOL's System for the IWSLT 2020 End-to-End Speech Translation Task
    Potapczyk, Tomasz
    Przybysz, Pawel
    17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 89 - 94
  • [32] End-to-End Speech Translation for Code Switched Speech
    Weller, Orion
    Sperber, Matthias
    Pires, Telmo
    Setiawan, Hendra
    Gollan, Christian
    Telaar, Dominic
    Paulik, Matthias
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
  • [33] Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
    Rumberg, Lars
    Ehlert, Hanna
    Luedtke, Ulrike
    Ostermann, Joern
    INTERSPEECH 2021, 2021, : 3850 - 3854
  • [34] Adversarial Multi-Task Learning for Robust End-to-End ECG-based Heartbeat Classification
    Shahin, Mostafa
    Oo, Ethan
    Ahmed, Beena
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 341 - 344
  • [35] An Interactive Multi-Task Learning Network for End-to-End Aspect-Based Sentiment Analysis
    He, Ruidan
    Lee, Wee Sun
    Ng, Hwee Tou
    Dahlmeier, Daniel
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 504 - 515
  • [36] Neural multi-task learning for end-to-end Arabic aspect-based sentiment analysis
    Bensoltane, Rajae
    Zaki, Taher
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [37] Multi-Task End-to-End Self-Driving Architecture for CAV Platoons
    Huch, Sebastian
    Ongel, Aybike
    Betz, Johannes
    Lienkamp, Markus
    SENSORS, 2021, 21 (04) : 1 - 20
  • [38] Multi-task Learning for End-to-end Noise-robust Bandwidth Extension
    Hou, Nana
    Xu, Chenglin
    Zhou, Joey Tianyi
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 4069 - 4073
  • [39] A multi-task learning framework for end-to-end aspect sentiment triplet extraction
    Chen, Fang
    Yang, Zhongliang
    Huang, Yongfeng
    NEUROCOMPUTING, 2022, 479 : 12 - 21
  • [40] Multi-Task Neural Learning Architecture for End-to-End Identification of Helpful Reviews
    Fan, Miao
    Feng, Yue
    Sun, Mingming
    Li, Ping
    Wang, Haifeng
    Wang, Jianmin
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 343 - 350