Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System

被引:4
|
作者
Yeong, Yin-Lai [1 ]
Tan, Tien-Ping [1 ]
Mohammad, Siti Khaotijah [1 ]
机构
[1] Univ Sci Malaysia, Sch Comp Sci, Gelugor 11800, Penang, Malaysia
关键词
Statistical machine tranlstion; English-Malay; dictionary; parallel corpus; lemmatization;
D O I
10.1016/j.procs.2016.04.056
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:243 / 249
页数:7
相关论文
共 50 条
  • [1] An English-Malay Translation Memory System
    Rahman, Suhaimi Bin Ab.
    Aziz, Normaziah Abdul
    Solemon, Badariah
    8TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY WORKSHOPS: CIT WORKSHOPS 2008, PROCEEDINGS, 2008, : 619 - 624
  • [2] An interactive tutoring system with tutorials generation and English-Malay translation abilities
    Dahlan, N
    Cvetkovic, S
    3RD ASIA PACIFIC COMPUTER HUMAN INTERACTION, PROCEEDINGS, 1998, : 441 - 445
  • [3] An Interactive Intelligent Tutoring System with tutorials generation and English-Malay translation abilities
    Dahlan, N
    INTELLIGENT TUTORING SYSTEMS, 1998, 1452 : 605 - 605
  • [4] A first step design in integrating an English-Malay Translation Memory System into the Semantic Web
    Ab Rahman, Suhaimi
    KMICE 2008 - KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE, 2008 - TRANSFERRING, MANAGING AND MAINTAINING KNOWLEDGE FOR NATION CAPACITY DEVELOPMENT, 2008, : 200 - 205
  • [5] A Dictionary- and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages
    Loponen, Aki
    Jarvelin, Kalervo
    MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS EVALUATION, 2010, 6360 : 3 - 14
  • [6] AN ENGLISH-JAPANESE MACHINE TRANSLATION SYSTEM USING THE ACTIVE DICTIONARY
    TANAKA, H
    ISAHARA, H
    YASUKAWA, H
    NEW GENERATION COMPUTING, 1983, 1 (02) : 179 - 185
  • [7] Optimal Translation of English to Bahasa Indonesia Using Statistical Machine Translation System
    Mantoro, Teddy
    Asian, Jelita
    Octavian, Riza
    Ayu, Media Anugerah
    2013 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR THE MUSLIM WORLD (ICT4M), 2013,
  • [8] Malay Manuscripts Transliteration Using Statistical Machine Translation (SMT)
    Razak, Sitti Munirah Abdul
    Abu Seman, Muhamad Sadry
    Ali, Wan
    Mamat, Wan Yusoff Wan
    Nizan, Noor Hasrul
    Noor, Mohammad
    2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA SCIENCES (AIDAS2019), 2019, : 137 - 141
  • [9] An English-Hindi statistical machine translation system
    Udupa, R
    Faruquie, TA
    NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 254 - 262
  • [10] Factored Statistical Machine Translation System for English to Tamil Language
    Anand, Kumar M.
    Dhanalakshmi
    Soman, K. P.
    Rajendran, S.
    PERTANIKA JOURNAL OF SOCIAL SCIENCE AND HUMANITIES, 2014, 22 (04): : 1045 - 1061