End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

被引:0
|
作者
Gallego, Gerard, I [1 ]
Tsiamas, Ioannis [1 ]
Escolano, Carlos [1 ]
Fonollosa, Jose A. R. [1 ]
Costa-jussa, Marta R. [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation system, which combines pre-trained models (Wav2Vec 2.0 and mBART) with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique, which trains only 20% of its total parameters. We show that adding an Adapter to the system and pre-training it, can increase the convergence speed and the final result, with which we achieve a BLEU score of 27.3 on the MuST-C test set. Our final model is an ensemble that obtains 28.22 BLEU score on the same set. Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2.0 for identifying periods of untranscribable text and can bring improvements of 2.5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.
引用
收藏
页码:110 / 119
页数:10
相关论文
共 50 条
  • [41] End-to-End Speech-to-Text Translation: A Survey
    Sethiya, Nivedita
    Maurya, Chandresh Kumar
    COMPUTER SPEECH AND LANGUAGE, 2025, 90
  • [42] Self-Training for End-to-End Speech Translation
    Pino, Juan
    Xu, Qiantong
    Ma, Xutai
    Dousti, Mohammad Javad
    Tang, Yun
    INTERSPEECH 2020, 2020, : 1476 - 1480
  • [43] K-ADAPTER: Infusing Knowledge into Pre-Trained Models with Adapters
    Wang, Ruize
    Tang, Duyu
    Duan, Nan
    Wei, Zhongyu
    Huang, Xuanjing
    Ji, Jianshu
    Cao, Guihong
    Jiang, Daxin
    Zhou, Ming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1405 - 1418
  • [44] An End-to-End Autonomous Driving Pre-trained Transformer Model for Multi-Behavior-Optimal Trajectory Generation
    Qian, Zelin
    Jiang, Kun
    Zhou, Weitao
    Wen, Junze
    Jing, Cheng
    Cao, Zhong
    Yang, Diange
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 4730 - 4737
  • [45] Fluent Translations from Disfluent Speech in End-to-End Speech Translation
    Salesky, Elizabeth
    Sperber, Matthias
    Waibel, Alex
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2786 - 2792
  • [46] An Experimental Methodology for an End-to-End Evaluation in Speech-to-Speech Translation
    Hamon, Olivier
    Mostefa, Djamel
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3539 - 3546
  • [47] End-to-end evaluation in JANUS: A speech-to-speech translation system
    Gates, D
    Lavie, A
    Levin, L
    Waibel, A
    Gavalda, M
    Mayfield, L
    Woszczyna, M
    Zhan, PM
    DIALOGUE PROCESSING IN SPOKEN LANGUAGE SYSTEMS, 1997, 1236 : 195 - 206
  • [48] SAR Image Despeckling by Deep Neural Networks: from a Pre-Trained Model to an End-to-End Training Strategy
    Dalsasso, Emanuele
    Yang, Xiangli
    Denis, Loic
    Tupin, Florence
    Yang, Wen
    REMOTE SENSING, 2020, 12 (16)
  • [49] SEQ2SEQ-SC: END-TO-END SEMANTIC COMMUNICATION SYSTEMS WITH PRE-TRAINED LANGUAGE MODEL
    Lee, Ju-Hyung
    Lee, Dong-Ho
    Sheen, Eunsoo
    Choi, Thomas
    Pujara, Jay
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 260 - 264
  • [50] END-TO-END SPOKEN LANGUAGE UNDERSTANDING USING TRANSFORMER NETWORKS AND SELF-SUPERVISED PRE-TRAINED FEATURES
    Morais, Edmilson
    Kuo, Hong-Kwang J.
    Thomas, Samuel
    Tuske, Zoltan
    Kingsbury, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7483 - 7487