Strategies for improving low resource speech to text translation relying on pre-trained ASR models

被引:0
|
作者
Kesiraju, Santosh [1 ]
Sarvas, Marek [1 ]
Pavlicek, Tomas [2 ]
Macaire, Cecile [3 ]
Ciuba, Alejandro [4 ]
机构
[1] Brno Univ Technol, Speech FIT, Brno, Czech Republic
[2] Phonexia, Brno, Czech Republic
[3] Univ Grenoble Alpes, Grenoble, France
[4] Univ Pittsburgh, Pittsburgh, PA 15260 USA
来源
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
speech translation; low-resource; multilingual; speech recognition;
D O I
10.21437/Interspeech.2023-2506
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively. Using the encoder-decoder framework for ST, our results show that a multilingual automatic speech recognition system acts as a good initialization under low-resource scenarios. Furthermore, using the CTC as an additional objective for translation during training and decoding helps to reorder the internal representations and improves the final translation. Through our experiments, we try to identify various factors (initializations, objectives, and hyper-parameters) that contribute the most for improvements in lowresource setups. With only 300 hours of pre-training data, our model achieved 7.3 BLEU score on Tamasheq - French data, outperforming prior published works from IWSLT 2022 by 1.6 points.
引用
收藏
页码:2148 / 2152
页数:5
相关论文
共 50 条
  • [1] EFFICIENT UTILIZATION OF LARGE PRE-TRAINED MODELS FOR LOW RESOURCE ASR
    Vieting, Peter
    Luescher, Christoph
    Dierkes, Julian
    Schlueter, Ralf
    Ney, Hermann
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [2] ADAPTING PRE-TRAINED LANGUAGE MODELS TO LOW-RESOURCE TEXT SIMPLIFICATION: THE PATH MATTERS
    Garbacea, Cristina
    Mei, Qiaozhu
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [3] Improving Braille-Chinese translation with jointly trained and pre-trained language models
    Huang, Tianyuan
    Su, Wei
    Liu, Lei
    Cai, Chuan
    Yu, Hailong
    Yuan, Yongna
    DISPLAYS, 2024, 82
  • [4] Low Resource Summarization using Pre-trained Language Models
    Munaf, Mubashir
    Afzal, Hammad
    Mahmood, Khawir
    Iltaf, Naima
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (10)
  • [5] ANALYZING ASR PRETRAINING FOR LOW-RESOURCE SPEECH-TO-TEXT TRANSLATION
    Stoian, Mihaela C.
    Bansal, Sameer
    Goldwater, Sharon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7909 - 7913
  • [6] Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?
    Lee, En-Shiun Annie
    Thillainathan, Sarubi
    Nayak, Shravan
    Ranathunga, Surangika
    Adelani, David Ifeoluwa
    Su, Ruisi
    McCarthy, Arya D.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 58 - 67
  • [7] Assessing and improving syntactic adversarial robustness of pre-trained models for code translation
    Yang, Guang
    Zhou, Yu
    Zhang, Xiangyu
    Chen, Xiang
    Han, Tingting
    Chen, Taolue
    INFORMATION AND SOFTWARE TECHNOLOGY, 2025, 181
  • [8] Extremely Low Resource Text simplification with Pre-trained Transformer Language Model
    Maruyama, Takumi
    Yamamoto, Kazuhide
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 53 - 58
  • [9] Pre-trained Text Embeddings for Enhanced Text-to-Speech Synthesis
    Hayashi, Tomoki
    Watanabe, Shinji
    Toda, Tomoki
    Takeda, Kazuya
    Toshniwal, Shubham
    Livescu, Karen
    INTERSPEECH 2019, 2019, : 4430 - 4434
  • [10] SPEECH SENTIMENT ANALYSIS VIA PRE-TRAINED FEATURES FROM END-TO-END ASR MODELS
    Lu, Zhiyun
    Cao, Liangliang
    Zhang, Yu
    Chiu, Chung-Cheng
    Fan, James
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7149 - 7153