Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System

被引:4
|
作者
Yeong, Yin-Lai [1 ]
Tan, Tien-Ping [1 ]
Mohammad, Siti Khaotijah [1 ]
机构
[1] Univ Sci Malaysia, Sch Comp Sci, Gelugor 11800, Penang, Malaysia
关键词
Statistical machine tranlstion; English-Malay; dictionary; parallel corpus; lemmatization;
D O I
10.1016/j.procs.2016.04.056
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:243 / 249
页数:7
相关论文
共 50 条
  • [41] English to Tamil machine translation system using universal networking language
    Sridhar, Rajeswari
    Sethuraman, Pavithra
    Krishnakumar, Kashyap
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2016, 41 (06): : 607 - 620
  • [42] Boosting performance of a Statistical Machine Translation system using dynamic parallelism
    Fernandez, M.
    Pichel, Juan C.
    Cabaleiro, Jose C.
    Pena, Tomas F.
    JOURNAL OF COMPUTATIONAL SCIENCE, 2016, 13 : 37 - 48
  • [43] Evaluation of a Machine Translation System for Low Resource Languages: METIS-II
    Vandeghinste, Vincent
    Dirix, Peter
    Schuurman, Ineke
    Markantonatou, Stella
    Sofianopoulos, Sokratis
    Vassiliou, Marina
    Yannoutsou, Olga
    Badia, Toni
    Melero, Maite
    Boleda, Gemma
    Carl, Michael
    Schmidt, Paul
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 449 - 456
  • [44] Using statistical machine translation model to improve domain-specific metasearch engines
    Lin, Kunhui
    2007 IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1-7, 2007, : 2425 - 2427
  • [45] Improve User Experience on Web for Machine Translation System using Storm
    Ahmad, Rashid
    Kumar, Pawan
    Kumar, Ashutosh
    Sinha, Mukul K.
    Chaudhary, B. D.
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 243 - 248
  • [46] Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages
    Nakov, Preslav
    Ng, Hwee Tou
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2012, 44 : 179 - 222
  • [47] Empirical Analysis of Phrase-Based Statistical Machine Translation System for English to Hindi Language
    Babhulgaonkar, Arun
    Sonavane, Shefali
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (02) : 135 - 162
  • [48] Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus
    Premjith, B.
    Kumar, M. Anand
    Soman, K. P.
    JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 387 - 398
  • [49] Improving Low-Resource Kazakh-English and Turkish-English Neural Machine Translation Using Transfer Learning and Part of Speech Tags
    Yazar, Bilge Kagan
    Kilic, Erdal
    IEEE ACCESS, 2025, 13 : 32341 - 32356
  • [50] GATITOS: Using a New Multilingual Lexicon for Low-resource Machine Translation
    Jones, Alex
    Caswell, Isaac
    Saxena, Ishank
    Firat, Orhan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 371 - 405