Making Language Model as Small as Possible in Statistical Machine Translation

被引:0
|
作者
Liu, Yang [1 ]
Zhang, Jiajun [1 ]
Hao, Jie [2 ]
Zhang, Dakun [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Toshiba China R&D Ctr, Beijing, Peoples R China
来源
关键词
language model pruning; frequent n-gram clustering; statistical machine translation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the key components, n-gram language model is most frequently used in statistical machine translation. Typically, higher order of the language model leads to better translation performance. However, higher order of the n-gram language model requires much more monolingual training data to avoid data sparseness. Furthermore, the model size increases exponentially when the n-gram order becomes higher and higher. In this paper, we investigate the language model pruning techniques that aim at making the model size as small as possible while keeping the translation quality. According to our investigation, we further propose to replace the higher order n-grams with a low-order cluster-based language model. The extensive experiments show that our method is very effective.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [31] Statistical machine translation
    Sanchez-Martinez, Felipe
    Antonio Perez-Ortiz, Juan
    MACHINE TRANSLATION, 2010, 24 (3-4) : 273 - 278
  • [32] Phrase-boundary model for statistical machine translation
    Salami, Shahram
    Shamsfard, Mehrnoush
    Khadivi, Shahram
    COMPUTER SPEECH AND LANGUAGE, 2016, 38 : 13 - 27
  • [33] Statistical Machine Translation
    Vandeghinste, Vincent
    Van Eynde, Frank
    TARGET-INTERNATIONAL JOURNAL OF TRANSLATION STUDIES, 2012, 24 (01) : 157 - 159
  • [34] Statistical Machine Translation
    Vatsa, Mukesh G. S.
    Joshi, Nikita
    Goswami, Sumit
    DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2010, 30 (04): : 25 - 32
  • [35] Statistical Machine Translation
    Babhulgaonkar, A. R.
    Bharad, S. V.
    2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), 2017, : 62 - 67
  • [36] A syntactically informed reordering model for statistical machine translation
    Farzi, Saeed
    Faili, Heshaam
    Khadivi, Shahram
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2015, 27 (04) : 449 - 469
  • [37] Statistical machine translation
    Lopez, Adam
    ACM COMPUTING SURVEYS, 2008, 40 (03)
  • [38] Statistical Machine Translation
    Cherry, Colin
    COMPUTATIONAL LINGUISTICS, 2010, 36 (04) : 773 - 776
  • [39] Statistical Machine Translation
    Zhang Xiaojun
    APPLIED LINGUISTICS, 2011, 32 (03) : 359 - 362
  • [40] Impact of Statistical Language Model on Example Based Machine Translation System between Kazakh and Turkish Languages
    Kessikbayeva, Gulshat
    Cicekli, Ilyas
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 112 - 118