Making Language Model as Small as Possible in Statistical Machine Translation

被引:0
|
作者
Liu, Yang [1 ]
Zhang, Jiajun [1 ]
Hao, Jie [2 ]
Zhang, Dakun [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Toshiba China R&D Ctr, Beijing, Peoples R China
来源
关键词
language model pruning; frequent n-gram clustering; statistical machine translation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the key components, n-gram language model is most frequently used in statistical machine translation. Typically, higher order of the language model leads to better translation performance. However, higher order of the n-gram language model requires much more monolingual training data to avoid data sparseness. Furthermore, the model size increases exponentially when the n-gram order becomes higher and higher. In this paper, we investigate the language model pruning techniques that aim at making the model size as small as possible while keeping the translation quality. According to our investigation, we further propose to replace the higher order n-grams with a low-order cluster-based language model. The extensive experiments show that our method is very effective.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [21] Towards incorporating language morphology into statistical machine translation systems
    Karageorgakis, P
    Potamianos, A
    Klasinas, I
    2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 80 - 85
  • [22] Farsi - German statistical machine translation through bridge language
    Bakhshaei S.
    Khadivi S.
    Riahi N.
    2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 557 - 561
  • [23] Analysis, preparation, and optimization of statistical sign language machine translation
    Stein, Daniel
    Schmidt, Christoph
    Ney, Hermann
    MACHINE TRANSLATION, 2012, 26 (04) : 325 - 357
  • [24] Linguistic Factors in Statistical Machine Translation Involving Arabic Language
    Youssef, Islam
    Sakr, Mohamed
    Kouta, Mohamed
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (11): : 154 - 159
  • [25] Applications of Statistical Machine Translation Approaches to Spoken Language Understanding
    Macherey, Klaus
    Bender, Oliver
    Ney, Hermann
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 803 - 818
  • [26] English Language Statistical Machine Translation Oriented Classification Algorithm
    Yan, Jia
    Chao, Wang
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA AND SMART CITY (ICITBS), 2016, : 376 - 379
  • [27] Statistical machine translation of subtitles for highly inflected language pair
    Maucec, Mirjam Sepesy
    Kacic, Zdravko
    Verdonik, Darinka
    PATTERN RECOGNITION LETTERS, 2014, 46 : 96 - 103
  • [28] Machine Translation and Welsh: Analysing free Statistical Machine Translation for the professional translation of an under-researched language pair
    Screen, Ben
    JOURNAL OF SPECIALISED TRANSLATION, 2017, (28): : 317 - 344
  • [29] On integrating a language model into neural machine translation
    Gulcehre, Caglar
    Firat, Orhan
    Xu, Kelvin
    Cho, Kyunghyun
    Bengio, Yoshua
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 137 - 148
  • [30] An Efficient Machine Translation Model for Dravidian Language
    Chandramma
    Pareek, Piyush Kumar
    Swathi, K.
    Shetteppanavar, Puneet
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 2101 - 2105