Code-switching input for machine translation: a case study of Vietnamese-English data

被引:3
|
作者
Nguyen, Li [1 ,2 ]
Mayeux, Oliver [3 ]
Yuan, Zheng [4 ]
机构
[1] Univ Cambridge, Inst Automated Language Teaching & Assessment ALTA, Cambridge CB3 0FD, England
[2] FPT Univ, Linguist & Language Technol Lab, Ho Chi Minh City 721400, Vietnam
[3] Univ Cambridge, Trinity Coll, Cambridge, England
[4] Kings Coll London, Dept Informat, London, England
关键词
Machine translation; code-switching; Vietnamese; human evaluation; automatic evaluation; lexico-semantic enrichment;
D O I
10.1080/14790718.2023.2224013
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Multilingualism presents both a challenge and an opportunity for Natural Language Processing, with code-switching representing a particularly interesting problem for computational models trained on monolingual datasets. In this paper, we explore how code-switched data affects the task of Machine Translation, a task which only recently has started to tackle the challenge of multilingual data. We test three Machine Translation systems on data from the Canberra Vietnamese-English Codeswitching Natural Speech Corpus (CanVEC) and evaluate translation output using both automatic and human metrics. We find that, perhaps counter-intuitively, Machine Translation performs better on code-switching input than monolingual input. In particular, comparison of human and automatic evaluation suggests that codeswitching input may boost the semantic faithfulness of the translation output, an effect we term lexico-semantic enrichment. We also report two cases where this effect is most and least clear in Vietnamese-English, namely gender-neutral 3SG pronouns and interrogative constructions respectively. Overall, we suggest that Machine Translation, and Natural Language Processing more generally, ought to view multilingualism as an opportunity rather than an obstacle.
引用
收藏
页码:2268 / 2289
页数:22
相关论文
共 50 条
  • [21] JAPANESE-ENGLISH CODE-SWITCHING SPEECH DATA CONSTRUCTION
    Nakayama, Sahoko
    Kano, Takatomo
    Quoc Truong Do
    Sakti, Sakriani
    Nakamura, Satoshi
    2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 67 - 71
  • [22] Code-switching in medieval English drama
    Diller, HJ
    COMPARATIVE DRAMA, 1997, 31 (04) : 506 - 537
  • [23] Code-Switching and College English Teaching
    Bo, Li
    PROCEEDINGS OF THE SIXTH NORTHEAST ASIA INTERNATIONAL SYMPOSIUM ON LANGUAGE, LITERATURE AND TRANSLATION, 2017, : 724 - 729
  • [24] CODE-SWITCHING - HINDI-ENGLISH
    VERMA, SK
    LINGUA, 1976, 38 (02) : 153 - 165
  • [25] Hinglish: code-switching in Indian English
    Sailaja, Pingali
    ELT JOURNAL, 2011, 65 (04) : 473 - 480
  • [26] Code-switching in early English literature
    Schendl, Herbert
    LANGUAGE AND LITERATURE, 2015, 24 (03) : 233 - 248
  • [27] How effective is machine translation on low-resource code-switching? A case study comparing human and automatic metrics
    Li Nguyen
    Bryant, Christopher
    Mayeux, Oliver
    Yuan, Zheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14186 - 14195
  • [28] A Study of Code-Switching in Students' Talk in College English Class
    席红梅
    赵快
    海外英语, 2015, (04) : 133 - 134
  • [29] Code-Switching in a Legal English Class for Egyptian Learners: A Conversation Analysis Case Study
    Dooly, Melinda
    Bakri, Ola
    ARAB WORLD ENGLISH JOURNAL, 2024, 15 (03) : 297 - 320
  • [30] Mixing Catalan, English and Spanish on WhatsApp A case study on language choice and code-switching
    Perez-Sabater, Carmen
    SPANISH IN CONTEXT, 2022, 19 (02) : 289 - 313