Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

被引:0
|
作者
Lin, Pin-Jie [1 ,2 ]
Saeed, Muhammed [1 ]
Chang, Ernie [3 ]
Scholman, Merel [2 ,4 ]
机构
[1] Saarland Informat Campus, Saarbrucken, Germany
[2] Saarland Univ, Language Sci & Technol, Saarbrucken, Germany
[3] Meta Inc, Real Labs, Menlo Pk, CA USA
[4] Univ Utrecht, ILS, Utrecht, Netherlands
来源
关键词
spoken language understanding; low-resource machine translation; low-resource language;
D O I
10.21437/Interspeech.2023-466
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance.
引用
收藏
页码:3954 / 3958
页数:5
相关论文
共 50 条
  • [31] Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
    Schlichtkrull, Michael Sejr
    Sogaard, Anders
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 220 - 229
  • [32] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [33] CAM: A cross-lingual adaptation framework for low-resource language speech recognition
    Hu, Qing
    Zhang, Yan
    Zhang, Xianlei
    Han, Zongyu
    Yu, Xilong
    INFORMATION FUSION, 2024, 111
  • [34] Is Translation Helpful? An Exploration of Cross-Lingual Transfer in Low-Resource Dialog Generation
    Shen, Lei
    Yu, Shuai
    Shen, Xiaoyu
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [35] SUBSPACE MIXTURE MODEL FOR LOW-RESOURCE SPEECH RECOGNITION IN CROSS-LINGUAL SETTINGS
    Miao, Yajie
    Metze, Florian
    Waibel, Alex
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7339 - 7343
  • [36] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248
  • [37] LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES
    Yu, Quanjie
    Liu, Peng
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lianhong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5545 - 5549
  • [38] Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
    Farooq, Muhammad Umar
    Hain, Thomas
    INTERSPEECH 2023, 2023, : 5072 - 5076
  • [39] Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations
    Zhang, Rui
    Westerfield, Caitlin
    Shim, Sungrok
    Bingham, Garrett
    Fabbri, Alexander
    Hu, William
    Verma, Neha
    Radev, Dragomir
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3173 - 3179
  • [40] Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios
    Eskander, Ramy
    Muresan, Smaranda
    Collins, Michael
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4820 - 4831