How effective is machine translation on low-resource code-switching? A case study comparing human and automatic metrics

被引:0
|
作者
Li Nguyen [1 ,2 ]
Bryant, Christopher [1 ]
Mayeux, Oliver [3 ]
Yuan, Zheng [1 ,4 ]
机构
[1] Univ Cambridge, ALTA Inst, Cambridge, England
[2] FPT Univ, Linguist & Language Technol Lab, Hanoi, Vietnam
[3] Univ Cambridge, Trinity Coll, Cambridge, England
[4] Kings Coll London, Dept Informat, London, England
关键词
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents an investigation into the differences between processing monolingual input and code-switching (CSW) input in the context of machine translation (MT). Specifically, we compare the performance of three MT systems (Google, mBART-50 and M2M-100(big)) in terms of their ability to translate monolingual Vietnamese, a low-resource language, and Vietnamese-English CSW respectively. To our knowledge, this is the first study to systematically analyse what might happen when multilingual MT systems are exposed to CSW data using both automatic and human metrics. We find that state-of-the-art neural translation systems not only achieve higher scores on automatic metrics when processing CSW input (compared to monolingual input), but also produce translations that are consistently rated as more semantically faithful by humans. We further suggest that automatic evaluation alone is insufficient for evaluating the translation of CSW input. Our findings establish a new benchmark that offers insights into the relationship between MT and CSW.
引用
收藏
页码:14186 / 14195
页数:10
相关论文
共 35 条
  • [21] A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation
    San, Mya Ei
    Usanavasin, Sasiporn
    Thu, Ye Kyaw
    Okumura, Manabu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [22] Pause-Based Phrase Extraction and Effective OOV Handling for Low-Resource Machine Translation Systems
    Mrinalini, K.
    Nagarajan, T.
    Vijayalakshmi, P.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (02)
  • [23] The Effective and the Controversial Uses of Code-Switching: Edwidge Danticat's Claire of the Sea Light as Case Study
    Ibarrola-Armendariz, Aitor
    COMPLUTENSE JOURNAL OF ENGLISH STUDIES, 2020, 28 : 35 - 43
  • [24] An Efficient Method for Generating Synthetic Data for Low-Resource Machine Translation An empirical study of Chinese, Japanese to Vietnamese Neural Machine Translation
    Thi-Vinh Ngo
    Phuong-Thai Nguyen
    Van Vinh Nguyen
    Thanh-Le Ha
    Le-Minh Nguyen
    APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [25] Strengthening Low-resource Neural Machine Translation through Joint Learning: The Case of Farsi-Spanish
    Ahmadnia, Benyamin
    Aranovich, Raul
    Dorr, Bonnie J.
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 475 - 481
  • [26] Language Model Priors and Data Augmentation Strategies for Low-resource Machine Translation: A Case Study Using Finnish to Northern Sami
    Saleva, Jonne
    Lignos, Constantine
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 12949 - 12956
  • [27] Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study
    Tarmom, Taghreed
    Teahan, William
    Atwell, Eric
    Alsalka, Mohammad Ammar
    NATURAL LANGUAGE ENGINEERING, 2020, 26 (06) : 663 - 676
  • [28] Impact of Filtering Generated Pseudo Bilingual Texts in Low-Resource Neural Machine Translation Enhancement: The Case of Persian-Spanish
    Ahmadnia, Benyamin
    Dorr, Bonnie J.
    Aranovich, Raul
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 136 - 141
  • [29] Towards Guided Back-translation for Low-resource languages- A Case Study on Kabyle-French
    Diab, Nassim
    Sadat, Fatiha
    Semmar, Nasredine
    2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
  • [30] Code-Switching Patterns Can Be an Effective Route to Improve Performance of Downstream NLP Applications: A Case Study of Humour, Sarcasm and Hate Speech Detection
    Bansal, Srijan
    Vishal, G.
    Suhane, Ayush
    Patro, Jasabanta
    Mukherjee, Animesh
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1018 - 1023