Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System

被引:4
|
作者
Yeong, Yin-Lai [1 ]
Tan, Tien-Ping [1 ]
Mohammad, Siti Khaotijah [1 ]
机构
[1] Univ Sci Malaysia, Sch Comp Sci, Gelugor 11800, Penang, Malaysia
关键词
Statistical machine tranlstion; English-Malay; dictionary; parallel corpus; lemmatization;
D O I
10.1016/j.procs.2016.04.056
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:243 / 249
页数:7
相关论文
共 50 条
  • [21] Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages
    Duh, Kevin
    McNamee, Paul
    Post, Matt
    Thompson, Brian
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2667 - 2675
  • [22] Improve example-based machine translation quality for low-resource language using ontology
    Khan Md Anwarus K.M.A.
    Yamada S.
    Tetsuro N.
    International Journal of Networked and Distributed Computing, 2017, 5 (3) : 176 - 191
  • [23] Improve Example-Based Machine Translation Quality for Low-Resource Language Using Ontology
    Salam, Khan Md Anwarus
    Yamada, Setsuo
    Tetsuro, Nishio
    APPLIED COMPUTING & INFORMATION TECHNOLOGY, 2018, 727 : 67 - 90
  • [24] An empirical analysis on statistical and neural machine translation system for English to Mizo language
    Devi C.S.
    Purkayastha B.S.
    International Journal of Information Technology, 2023, 15 (8) : 4021 - 4028
  • [25] Low resource machine translation of english-manipuri: A semi-supervised approach
    Singh, Salam Michael
    Singh, Thoudam Doren
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
  • [26] Improved neural machine translation for low-resource English-Assamese pair
    Laskar, Sahinur Rahman
    Khilji, Abdullah Faiz Ur Rahman
    Pakray, Partha
    Bandyopadhyay, Sivaji
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4727 - 4738
  • [27] Neural Machine Translation Advised by Statistical Machine Translation: The Case of Farsi-Spanish Bilingually Low-Resource Scenario
    Ahmadnia, Benyamin
    Kordjamshidi, Parisa
    Haffari, Gholamreza
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1209 - 1213
  • [28] Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary
    Itagaki, Masaki
    Aikawa, Takako
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1584 - 1588
  • [29] Adding Visual Information to Improve Multimodal Machine Translation for Low-Resource Language
    Shi, Xiayang
    Yu, Zhenqiang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [30] The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English
    Guzman, Francisco
    Chen, Peng-Jen
    Ott, Myle
    Pino, Juan
    Lample, Guillaume
    Koehn, Philipp
    Chaudhary, Vishrav
    Ranzato, Marc'Aurelio
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6098 - 6111