End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

被引：0

作者：

Gallego, Gerard, I ^{[1
]}

Tsiamas, Ioannis ^{[1
]}

Escolano, Carlos ^{[1
]}

Fonollosa, Jose A. R. ^{[1
]}

Costa-jussa, Marta R. ^{[1
]}

机构：

[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain

来源：

IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION | 2021年

基金：

欧洲研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation system, which combines pre-trained models (Wav2Vec 2.0 and mBART) with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique, which trains only 20% of its total parameters. We show that adding an Adapter to the system and pre-training it, can increase the convergence speed and the final result, with which we achieve a BLEU score of 27.3 on the MuST-C test set. Our final model is an ensemble that obtains 28.22 BLEU score on the same set. Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2.0 for identifying periods of untranscribable text and can bring improvements of 2.5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.

引用

页码：110 / 119

页数：10

共 50 条

[31] Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning
Zhao, Chen
He, Yeye
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2413 - 2424
[32] Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection
Mehmood, Asif
Khan, Muhammad Attique
Sharif, Muhammad
Khan, Sajid Ali
Shaheen, Muhammad
Saba, Tanzila
Riaz, Naveed
Ashraf, Imran
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 14979 - 14999
[33] Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection
Asif Mehmood
Muhammad Attique Khan
Muhammad Sharif
Sajid Ali Khan
Muhammad Shaheen
Tanzila Saba
Naveed Riaz
Imran Ashraf
Multimedia Tools and Applications, 2024, 83 : 14979 - 14999
[34] Investigating Self-supervised Pre-training for End-to-end Speech Translation
Ha Nguyen
Bougares, Fethi
Tomashenko, Natalia
Esteve, Yannick
Besacier, Laurent
INTERSPEECH 2020, 2020, : 1466 - 1470
[35] MINTZAI: End-to-end Deep Learning for Speech Translation
Etchegoyhen, Thierry
Arzelus, Haritz
Gete, Harritxu
Alvarez, Aitor
Hernaez, Inma
Navas, Eva
Gonzalez-Docasal, Ander
Osacar, Jaime
Benites, Edson
Ellakuria, Igor
Calonge, Eusebi
Martin, Maite
PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
[36] Adaptive Feature Selection for End-to-End Speech Translation
Zhang, Biao
Titov, Ivan
Haddow, Barry
Sennrich, Rico
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2533 - 2544
[37] Speaker voice normalization for end-to-end speech translation
Xue, Zhengshan
Shi, Tingxun
Zhang, Xiaolei
Xiong, Deyi
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
[38] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
Ren, Yi
Liu, Jinglin
Tan, Xu
Zhang, Chen
Qin, Tao
Zhao, Zhou
Liu, Tie-Yan
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796
[39] End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Zhang, Shaolei
Feng, Yang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7659 - 7680
[40] A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION
Bahar, Parnia
Bieschke, Tobias
Ney, Hermann
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 792 - 799

← 1 2 3 4 5 →