Strategies for improving low resource speech to text translation relying on pre-trained ASR models

被引:0
|
作者
Kesiraju, Santosh [1 ]
Sarvas, Marek [1 ]
Pavlicek, Tomas [2 ]
Macaire, Cecile [3 ]
Ciuba, Alejandro [4 ]
机构
[1] Brno Univ Technol, Speech FIT, Brno, Czech Republic
[2] Phonexia, Brno, Czech Republic
[3] Univ Grenoble Alpes, Grenoble, France
[4] Univ Pittsburgh, Pittsburgh, PA 15260 USA
来源
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
speech translation; low-resource; multilingual; speech recognition;
D O I
10.21437/Interspeech.2023-2506
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively. Using the encoder-decoder framework for ST, our results show that a multilingual automatic speech recognition system acts as a good initialization under low-resource scenarios. Furthermore, using the CTC as an additional objective for translation during training and decoding helps to reorder the internal representations and improves the final translation. Through our experiments, we try to identify various factors (initializations, objectives, and hyper-parameters) that contribute the most for improvements in lowresource setups. With only 300 hours of pre-training data, our model achieved 7.3 BLEU score on Tamasheq - French data, outperforming prior published works from IWSLT 2022 by 1.6 points.
引用
收藏
页码:2148 / 2152
页数:5
相关论文
共 50 条
  • [31] Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
    Wang, Haoyu
    Zhang, Wei-Qiang
    Suo, Hongbin
    Wan, Yulong
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 11 - 15
  • [32] Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models
    Ma, Kaixin
    Ilievski, Filip
    Francis, Jonathan
    Ozaki, Satoru
    Nyberg, Eric
    Oltramari, Alessandro
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5474 - 5483
  • [33] XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
    Nguyen, Linh The
    Pham, Thinh
    Nguyen, Dat Quoc
    INTERSPEECH 2023, 2023, : 5506 - 5510
  • [34] Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation
    Mieradilijiang Maimaiti
    Yang Liu
    Huanbo Luan
    Maosong Sun
    TsinghuaScienceandTechnology, 2022, 27 (01) : 150 - 163
  • [35] Video Colorization with Pre-trained Text-to-Image Diffusion Models
    Liu, Hanyuan
    Xie, Minshan
    Xing, Jinbo
    Li, Chengze
    Wong, Tien-Tsin
    arXiv, 2023,
  • [36] Unstructured Pruning and Low Rank Factorisation of Self-Supervised Pre-Trained Speech Models
    Wang, Haoyu
    Zhang, Wei-Qiang
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (06) : 1046 - 1058
  • [37] Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation
    Maimaiti, Mieradilijiang
    Liu, Yang
    Luan, Huanbo
    Sun, Maosong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (01) : 150 - 163
  • [38] Embedding Articulatory Constraints for Low-resource Speech Recognition Based on Large Pre-trained Model
    Lee, Jaeyoung
    Mimura, Masato
    Kawahara, Tatsuya
    INTERSPEECH 2023, 2023, : 1394 - 1398
  • [39] Non-Autoregressive Text Generation with Pre-trained Language Models
    Su, Yixuan
    Cai, Deng
    Wang, Yan
    Vandyke, David
    Baker, Simon
    Li, Piji
    Collier, Nigel
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 234 - 243
  • [40] ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining
    Minh Phuc Nguyen
    Vu Hoang Tran
    Vu Hoang
    Ta Duc Huy
    Bui, Trung H.
    Truong, Steven Q. H.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 328 - 337