End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

被引:0
|
作者
Gallego, Gerard, I [1 ]
Tsiamas, Ioannis [1 ]
Escolano, Carlos [1 ]
Fonollosa, Jose A. R. [1 ]
Costa-jussa, Marta R. [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation system, which combines pre-trained models (Wav2Vec 2.0 and mBART) with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique, which trains only 20% of its total parameters. We show that adding an Adapter to the system and pre-training it, can increase the convergence speed and the final result, with which we achieve a BLEU score of 27.3 on the MuST-C test set. Our final model is an ensemble that obtains 28.22 BLEU score on the same set. Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2.0 for identifying periods of untranscribable text and can bring improvements of 2.5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.
引用
收藏
页码:110 / 119
页数:10
相关论文
共 50 条
  • [31] Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning
    Zhao, Chen
    He, Yeye
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2413 - 2424
  • [32] Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection
    Mehmood, Asif
    Khan, Muhammad Attique
    Sharif, Muhammad
    Khan, Sajid Ali
    Shaheen, Muhammad
    Saba, Tanzila
    Riaz, Naveed
    Ashraf, Imran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 14979 - 14999
  • [33] Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection
    Asif Mehmood
    Muhammad Attique Khan
    Muhammad Sharif
    Sajid Ali Khan
    Muhammad Shaheen
    Tanzila Saba
    Naveed Riaz
    Imran Ashraf
    Multimedia Tools and Applications, 2024, 83 : 14979 - 14999
  • [34] Investigating Self-supervised Pre-training for End-to-end Speech Translation
    Ha Nguyen
    Bougares, Fethi
    Tomashenko, Natalia
    Esteve, Yannick
    Besacier, Laurent
    INTERSPEECH 2020, 2020, : 1466 - 1470
  • [35] MINTZAI: End-to-end Deep Learning for Speech Translation
    Etchegoyhen, Thierry
    Arzelus, Haritz
    Gete, Harritxu
    Alvarez, Aitor
    Hernaez, Inma
    Navas, Eva
    Gonzalez-Docasal, Ander
    Osacar, Jaime
    Benites, Edson
    Ellakuria, Igor
    Calonge, Eusebi
    Martin, Maite
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
  • [36] Adaptive Feature Selection for End-to-End Speech Translation
    Zhang, Biao
    Titov, Ivan
    Haddow, Barry
    Sennrich, Rico
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2533 - 2544
  • [37] Speaker voice normalization for end-to-end speech translation
    Xue, Zhengshan
    Shi, Tingxun
    Zhang, Xiaolei
    Xiong, Deyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [38] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhang, Chen
    Qin, Tao
    Zhao, Zhou
    Liu, Tie-Yan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796
  • [39] End-to-End Simultaneous Speech Translation with Differentiable Segmentation
    Zhang, Shaolei
    Feng, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7659 - 7680
  • [40] A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION
    Bahar, Parnia
    Bieschke, Tobias
    Ney, Hermann
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 792 - 799