Making Language Model as Small as Possible in Statistical Machine Translation

被引:0
|
作者
Liu, Yang [1 ]
Zhang, Jiajun [1 ]
Hao, Jie [2 ]
Zhang, Dakun [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Toshiba China R&D Ctr, Beijing, Peoples R China
来源
关键词
language model pruning; frequent n-gram clustering; statistical machine translation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the key components, n-gram language model is most frequently used in statistical machine translation. Typically, higher order of the language model leads to better translation performance. However, higher order of the n-gram language model requires much more monolingual training data to avoid data sparseness. Furthermore, the model size increases exponentially when the n-gram order becomes higher and higher. In this paper, we investigate the language model pruning techniques that aim at making the model size as small as possible while keeping the translation quality. According to our investigation, we further propose to replace the higher order n-grams with a low-order cluster-based language model. The extensive experiments show that our method is very effective.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Statistical Machine Translation as a Language Model for Handwriting Recognition
    Devlin, Jacob
    Kamali, Matin
    Subramanian, Krishna
    Prasad, Rohit
    Natarajan, Prem
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 291 - 296
  • [2] Syntactic discriminative language model rerankers for statistical machine translation
    Carter, Simon
    Monz, Christof
    MACHINE TRANSLATION, 2011, 25 (04) : 317 - 339
  • [3] Statistical machine translation into a morphologically complex language
    Oflazer, Kemal
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 376 - 387
  • [4] Data Categorization and Model Weighting Approach for Language Model Adaptation in Statistical Machine Translation
    AbuHamad, Mohammed
    Mohd, Masnizah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 135 - 141
  • [5] Compact WFSA Based Language Model and Its Application in Statistical Machine Translation
    Fu, Xiaoyin
    Wei, Wei
    Lu, Shixiang
    Ke, Dengfeng
    Xu, Bo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 154 - 163
  • [6] Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation
    Wang, Rui
    Zhao, Hai
    Lu, Bao-Liang
    Utiyama, Masao
    Sumita, Eiichiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1209 - 1220
  • [7] Translation Model of Myanmar Phrases for Statistical Machine Translation
    Zin, Thet Thet
    Soe, Khin Mar
    Thein, Ni Lar
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 235 - +
  • [8] Statistical Machine Translation as a Grammar Checker for Persian Language
    Ehsan, Nava
    Faili, Heshaam
    SIXTH INTERNATIONAL MULTI-CONFERENCE ON COMPUTING IN THE GLOBAL INFORMATION TECHNOLOGY (ICCGI 2011), 2011, : 20 - 26
  • [9] Language Localisation of Tamil using Statistical Machine Translation
    Achchuthan, Y.
    Sarveswaran, K.
    2015 FIFTEENTH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2015, : 125 - 129
  • [10] An Investigation on Statistical Machine Translation with Neural Language Models
    Zhao, Yinggong
    Huang, Shujian
    Chen, Huadong
    Chen, Jiajun
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 175 - 186