Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System

被引:4
|
作者
Yeong, Yin-Lai [1 ]
Tan, Tien-Ping [1 ]
Mohammad, Siti Khaotijah [1 ]
机构
[1] Univ Sci Malaysia, Sch Comp Sci, Gelugor 11800, Penang, Malaysia
关键词
Statistical machine tranlstion; English-Malay; dictionary; parallel corpus; lemmatization;
D O I
10.1016/j.procs.2016.04.056
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:243 / 249
页数:7
相关论文
共 50 条
  • [31] English-Arabic Hybrid Machine Translation System using EBMT and Translation Memory
    Ehab, Rana
    Gadallah, Mahmoud
    Amer, Eslam
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 195 - 203
  • [32] A Reordering Model For Vietnamese-English Statistical Machine Translation Using Dependency Information
    Viet Hong Tran
    Huyen Thuong Vu
    Thu Hoai Pham
    Vinh Van Nguyen
    Minh Le Nguyen
    2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 125 - 130
  • [33] The neural machine translation models for the low-resource Kazakh-English language pair
    Karyukin, Vladislav
    Rakhimova, Diana
    Karibayeva, Aidana
    Turganbayeva, Aliya
    Turarbek, Asem
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [34] A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation
    San, Mya Ei
    Usanavasin, Sasiporn
    Thu, Ye Kyaw
    Okumura, Manabu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [35] Machine Translation for Low-Resource English-Mizo Pair Encountering Tonal Words
    Khenglawt, Vanlalmuansangi
    Laskar, Sahinur Rahman
    Pakray, Partha
    Manna, Riyanka
    Khan, Ajoy Kumar
    COMPUTACION Y SISTEMAS, 2022, 26 (03): : 1377 - 1398
  • [36] Statistical Machine Translation for Bilingually Low-Resource Scenarios: A Round-Tripping Approach
    Ahmadnia, Benyamin
    Haffari, Gholamreza
    Serrano, Javier
    2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 261 - 265
  • [37] Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages
    Hai-Long Trieu
    Duc-Vu Tran
    Ittoo, Ashwin
    Le-Minh Nguyen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (03)
  • [38] Extracting Bilingual Multi-word Expressions for Low-resource Statistical Machine Translation
    Wei, Linyu
    Li, Miao
    Chen, Lei
    Yang, Zhenxin
    Sun, Kai
    Yuan, Man
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 21 - 24
  • [39] English to Tamil machine translation system using universal networking language
    Rajeswari Sridhar
    Pavithra Sethuraman
    Kashyap Krishnakumar
    Sādhanā, 2016, 41 : 607 - 620
  • [40] An Efficient English to Hindi Machine Translation System Using Hybrid Mechanism
    Nair, Jayashree
    Krishnan, Amrutha K.
    Deetha, R.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2109 - 2113