End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

被引:0
|
作者
Gallego, Gerard, I [1 ]
Tsiamas, Ioannis [1 ]
Escolano, Carlos [1 ]
Fonollosa, Jose A. R. [1 ]
Costa-jussa, Marta R. [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation system, which combines pre-trained models (Wav2Vec 2.0 and mBART) with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique, which trains only 20% of its total parameters. We show that adding an Adapter to the system and pre-training it, can increase the convergence speed and the final result, with which we achieve a BLEU score of 27.3 on the MuST-C test set. Our final model is an ensemble that obtains 28.22 BLEU score on the same set. Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2.0 for identifying periods of untranscribable text and can bring improvements of 2.5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.
引用
收藏
页码:110 / 119
页数:10
相关论文
共 50 条
  • [21] Pre-trained multimodal end-to-end network for spoken language assessment incorporating prompts
    Lin, Binghuai
    Wang, Liyuan
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1394 - 1398
  • [22] Grounding End-to-End Pre-trained architectures for Semantic Role Labeling in multiple languages
    Hromei, Claudiu D.
    Croce, Danilo
    Basili, Roberto
    INTELLIGENZA ARTIFICIALE, 2023, 17 (02) : 173 - 191
  • [23] MULTILINGUAL END-TO-END SPEECH TRANSLATION
    Inaguma, Hirofumi
    Duh, Kevin
    Kawahara, Tatsuya
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
  • [24] End-to-End Offline Speech Translation System for IWSLT 2020 using Modality Agnostic Meta-Learning
    Lakumarapu, Nikhil Kumar
    Lee, Beomseok
    Indurthi, Sathish
    Han, Houjeung
    Zaidi, Mohd Abbas
    Kim, Sangha
    17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 73 - 79
  • [25] End-to-End Speech Translation for Code Switched Speech
    Weller, Orion
    Sperber, Matthias
    Pires, Telmo
    Setiawan, Hendra
    Gollan, Christian
    Telaar, Dominic
    Paulik, Matthias
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
  • [26] Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
    Dai, Dongyang
    Wu, Zhiyong
    Kang, Shiyin
    Wu, Xixin
    Jia, Jia
    Su, Dan
    Yu, Dong
    Meng, Helen
    INTERSPEECH 2019, 2019, : 2090 - 2094
  • [27] INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Seo, Seunghyun
    Kwak, Donghyun
    Lee, Bowon
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7152 - 7156
  • [28] End-to-End Speech Translation with Adversarial Training
    Li, Xuancai
    Chen, Kehai
    Zhao, Tiejun
    Yang, Muyun
    WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
  • [29] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
    Berard, Alexandre
    Besacier, Laurent
    Kocabiyikoglu, Ali Can
    Pietquin, Olivier
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228
  • [30] End-to-End Speech Translation with Knowledge Distillation
    Liu, Yuchen
    Xiong, Hao
    Zhang, Jiajun
    He, Zhongjun
    Wu, Hua
    Wang, Haifeng
    Zong, Chengqing
    INTERSPEECH 2019, 2019, : 1128 - 1132