Strategies for improving low resource speech to text translation relying on pre-trained ASR models

被引:0
|
作者
Kesiraju, Santosh [1 ]
Sarvas, Marek [1 ]
Pavlicek, Tomas [2 ]
Macaire, Cecile [3 ]
Ciuba, Alejandro [4 ]
机构
[1] Brno Univ Technol, Speech FIT, Brno, Czech Republic
[2] Phonexia, Brno, Czech Republic
[3] Univ Grenoble Alpes, Grenoble, France
[4] Univ Pittsburgh, Pittsburgh, PA 15260 USA
来源
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
speech translation; low-resource; multilingual; speech recognition;
D O I
10.21437/Interspeech.2023-2506
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively. Using the encoder-decoder framework for ST, our results show that a multilingual automatic speech recognition system acts as a good initialization under low-resource scenarios. Furthermore, using the CTC as an additional objective for translation during training and decoding helps to reorder the internal representations and improves the final translation. Through our experiments, we try to identify various factors (initializations, objectives, and hyper-parameters) that contribute the most for improvements in lowresource setups. With only 300 hours of pre-training data, our model achieved 7.3 BLEU score on Tamasheq - French data, outperforming prior published works from IWSLT 2022 by 1.6 points.
引用
收藏
页码:2148 / 2152
页数:5
相关论文
共 50 条
  • [21] Automatic Prosody Annotation with Pre-Trained Text-Speech Model
    Dai, Ziqian
    Yu, Jianwei
    Wang, Yan
    Chen, Nuo
    Bian, Yanyao
    Li, Guangzhi
    Cai, Deng
    Yu, Dong
    INTERSPEECH 2022, 2022, : 5513 - 5517
  • [22] Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models
    Qu, Bowen
    Li, Chenda
    Bai, Jinfeng
    Qian, Yanmin
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 329 - 333
  • [23] Text Detoxification using Large Pre-trained Neural Models
    Dale, David
    Voronov, Anton
    Dementieva, Daryna
    Logacheva, Varvara
    Kozlova, Olga
    Semenov, Nikita
    Panchenko, Alexander
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7979 - 7996
  • [24] Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression
    Wang, Yuxia
    Beck, Daniel
    Baldwin, Timothy
    Verspoor, Karin
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 680 - 696
  • [25] EFFICIENT TEXT ANALYSIS WITH PRE-TRAINED NEURAL NETWORK MODELS
    Cui, Jia
    Lu, Heng
    Wang, Wenjie
    Kang, Shiyin
    He, Liqiang
    Li, Guangzhi
    Yu, Dong
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 671 - 676
  • [26] No-Reference Speech Intelligibility Prediction Leveraging a Noisy-Speech ASR Pre-Trained Model
    Wang, Haolan
    Edraki, Amin
    Chan, Wai-Yip
    Lopez-Espejo, Ivan
    Jensen, Jesper
    INTERSPEECH 2024, 2024, : 3849 - 3853
  • [27] Deep Fusing Pre-trained Models into Neural Machine Translation
    Weng, Rongxiang
    Yu, Heng
    Luo, Weihua
    Zhang, Min
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11468 - 11476
  • [28] Multilingual Translation via Grafting Pre-trained Language Models
    Sun, Zewei
    Wang, Mingxuan
    Li, Lei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2735 - 2747
  • [29] How to Estimate Model Transferability of Pre-Trained Speech Models?
    Chen, Zih-Ching
    Yang, Chao-Han Huck
    Li, Bo
    Zhang, Yu
    Chen, Nanxin
    Chang, Shou-Yiin
    Prabhavalkar, Rohit
    Lee, Hung-yi
    Sainath, Tara N.
    INTERSPEECH 2023, 2023, : 456 - 460
  • [30] Abusive and Hate speech Classification in Arabic Text Using Pre-trained Language Models and Data Augmentation
    Badri, Nabil
    Kboubi, Ferihane
    Chaibi, Anja Habacha
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (11)