How effective is machine translation on low-resource code-switching? A case study comparing human and automatic metrics

被引:0
|
作者
Li Nguyen [1 ,2 ]
Bryant, Christopher [1 ]
Mayeux, Oliver [3 ]
Yuan, Zheng [1 ,4 ]
机构
[1] Univ Cambridge, ALTA Inst, Cambridge, England
[2] FPT Univ, Linguist & Language Technol Lab, Hanoi, Vietnam
[3] Univ Cambridge, Trinity Coll, Cambridge, England
[4] Kings Coll London, Dept Informat, London, England
关键词
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents an investigation into the differences between processing monolingual input and code-switching (CSW) input in the context of machine translation (MT). Specifically, we compare the performance of three MT systems (Google, mBART-50 and M2M-100(big)) in terms of their ability to translate monolingual Vietnamese, a low-resource language, and Vietnamese-English CSW respectively. To our knowledge, this is the first study to systematically analyse what might happen when multilingual MT systems are exposed to CSW data using both automatic and human metrics. We find that state-of-the-art neural translation systems not only achieve higher scores on automatic metrics when processing CSW input (compared to monolingual input), but also produce translations that are consistently rated as more semantically faithful by humans. We further suggest that automatic evaluation alone is insufficient for evaluating the translation of CSW input. Our findings establish a new benchmark that offers insights into the relationship between MT and CSW.
引用
收藏
页码:14186 / 14195
页数:10
相关论文
共 35 条
  • [1] Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
    Palivela, Hemant
    Narvekar, Meera
    Asirvatham, David
    Bhushan, Shashi
    Rishiwal, Vinay
    Agarwal, Udit
    IEEE ACCESS, 2025, 13 : 9171 - 9198
  • [2] Automatic Meta-evaluation of Low-Resource Machine Translation Evaluation Metrics
    Yu, Junting
    Liu, Wuying
    He, Hongye
    Wang, Lin
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 136 - 141
  • [3] CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection
    Madan, Chetan
    Diddee, Harshita
    Kumar, Deepika
    Mittal, Mamta
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [4] Revisiting Low-Resource Neural Machine Translation: A Case Study
    Sennrich, Rico
    Zhang, Biao
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 211 - 221
  • [5] Code-switching input for machine translation: a case study of Vietnamese-English data
    Nguyen, Li
    Mayeux, Oliver
    Yuan, Zheng
    INTERNATIONAL JOURNAL OF MULTILINGUALISM, 2024, 21 (04) : 2268 - 2289
  • [6] Automatic Machine Translation of Poetry and a Low-Resource Language Pair
    Dunder, I
    Seljan, S.
    Pavlovski, M.
    2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020), 2020, : 1034 - 1039
  • [7] The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation
    Ahia, Orevaoghene
    Kreutzer, Julia
    Hooker, Sara
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3316 - 3333
  • [8] Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language
    Aguero-Torales, Marvin M.
    Lopez-Herrera, Antonio G.
    Vilares, David
    COGNITIVE COMPUTATION, 2023, 15 (04) : 1391 - 1406
  • [9] Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language
    Marvin M. Agüero-Torales
    Antonio G. López-Herrera
    David Vilares
    Cognitive Computation, 2023, 15 : 1391 - 1406
  • [10] Investigations on speech recognition systems for low-resource dialectal Arabic-English code-switching speech
    Hamed, Injy
    Denisov, Pavel
    Li, Chia-Yu
    Elmahdy, Mohamed
    Abdennadher, Slim
    Ngoc Thang Vu
    COMPUTER SPEECH AND LANGUAGE, 2022, 72