Terminology Translation in Low-Resource Scenarios

被引:2
|
作者
Haque, Rejwanul [1 ]
Hasanuzzaman, Mohammed [2 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, Dublin 9, Glasnevin, Ireland
[2] Cork Inst Technol, Dept Comp Sci, Cork T12 P928, Ireland
基金
爱尔兰科学基金会;
关键词
machine translation; terminology translation; phrase-based statistical machine translation; neural machine translation; terminology translation evaluation;
D O I
10.3390/info10090273
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Term translation quality in machine translation (MT), which is usually measured by domain experts, is a time-consuming and expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems often need to be updated for many reasons (e.g., availability of new training data, leading MT techniques). To the best of our knowledge, as of yet, there is no publicly-available solution to evaluate terminology translation in MT automatically. Hence, there is a genuine need to have a faster and less-expensive solution to this problem, which could help end-users to identify term translation problems in MT instantly. This study presents a faster and less expensive strategy for evaluating terminology translation in MT. High correlations of our evaluation results with human judgements demonstrate the effectiveness of the proposed solution. The paper also introduces a classification framework, TermCat, that can automatically classify term translation-related errors and expose specific problems in relation to terminology translation in MT. We carried out our experiments with a low resource language pair, English-Hindi, and found that our classifier, whose accuracy varies across the translation directions, error classes, the morphological nature of the languages, and MT models, generally performs competently in the terminology translation classification task.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Extremely low-resource neural machine translation for Asian languages
    Rubino, Raphael
    Marie, Benjamin
    Dabre, Raj
    Fujita, Atushi
    Utiyama, Masao
    Sumita, Eiichiro
    MACHINE TRANSLATION, 2020, 34 (04) : 347 - 382
  • [42] Introduction to the Special Issue on Machine Translation for Low-Resource Languages
    Liu, Chao-Hong
    Karakanta, Alina
    Tong, Audrey N.
    Aulov, Oleg
    Soboroff, Ian M.
    Washington, Jonathan
    Zhao, Xiaobing
    MACHINE TRANSLATION, 2020, 34 (04) : 247 - 249
  • [43] Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution
    Nguyen, Toan Q.
    Murray, Kenton
    Chiang, David
    IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 287 - 293
  • [44] Revisiting Low-Resource Neural Machine Translation: A Case Study
    Sennrich, Rico
    Zhang, Biao
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 211 - 221
  • [45] Continual Attention Modeling for Successive Sentiment Analysis in Low-resource Scenarios
    Zhang, Han
    Wang, Jing-Jing
    Luo, Jia-Min
    Zhou, Guo-Dong
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (12): : 5470 - 5486
  • [46] A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
    Hedderich, Michael A.
    Lange, Lukas
    Adel, Heike
    Strotgen, Jannik
    Klakow, Dietrich
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2545 - 2568
  • [47] JOINT PREDICTION OF TRUECASING AND PUNCTUATION FOR CONVERSATIONAL SPEECH IN LOW-RESOURCE SCENARIOS
    Pappagari, Raghavendra
    Zelasko, Piotr
    Mikolajczyk, Agnieszka
    Pezik, Piotr
    Dehak, Najim
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1185 - 1191
  • [48] Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese
    Li, Hongzheng
    Sha, Jiu
    Shi, Can
    IEEE ACCESS, 2020, 8 (08) : 119931 - 119939
  • [49] Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation
    Zhang, Wenbo
    Li, Xiao
    Yang, Yating
    Dong, Rui
    Luo, Gongxu
    FUTURE INTERNET, 2020, 12 (12): : 1 - 13
  • [50] ANALYZING ASR PRETRAINING FOR LOW-RESOURCE SPEECH-TO-TEXT TRANSLATION
    Stoian, Mihaela C.
    Bansal, Sameer
    Goldwater, Sharon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7909 - 7913