Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution

被引:0
|
作者
Ko, Yuka [1 ]
Sudoh, Katsuhito [1 ]
Sakti, Sakriani [1 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol NAIST, Ikoma 6300192, Japan
[2] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Peoples R China
关键词
end-to-end speech translation; spoken language translation; multi-task learning; knowledge distillation; ARCHITECTURE;
D O I
10.1587/transinf.2023EDP7249
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.
引用
收藏
页码:1322 / 1331
页数:10
相关论文
共 50 条
  • [41] Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation
    Salesky, Elizabeth
    Sperber, Matthias
    Black, Alan W.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1835 - 1841
  • [42] Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data
    Zhang, Yuhao
    Xu, Chen
    Hu, Bojie
    Zhang, Chunliang
    Xiao, Tong
    Zhu, Jingbo
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13984 - 13992
  • [43] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    INTERSPEECH 2020, 2020, : 536 - 540
  • [44] Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems
    Arai, Kenichi
    Ogawa, Atsunori
    Araki, Shoko
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Kamo, Naoyuki
    Irino, Toshio
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1583 - 1589
  • [45] End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model
    Feng, Han
    Ueno, Sei
    Kawahara, Tatsuya
    INTERSPEECH 2020, 2020, : 501 - 505
  • [46] Dysarthric Speech Augmentation Using Prosodic Transformation and Masking for Subword End-to-end ASR
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Berry, Jeffrey
    2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 42 - 46
  • [47] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    INTERSPEECH 2021, 2021, : 2551 - 2555
  • [48] Contextual Biasing for End-to-End Chinese ASR
    Zhang, Kai
    Zhang, Qiuxia
    Wang, Chung-Che
    Jang, Jyh-Shing Roger
    IEEE ACCESS, 2024, 12 : 92960 - 92975
  • [49] UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR
    Sivaraman, Ganesh
    Casal, Ricardo
    Garland, Matt
    Khoury, Elie
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6987 - 6991
  • [50] End-to-End Topic Classification without ASR
    Dong, Zexian
    Liu, Jia
    Zhang, Wei-Qiang
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,