Making Language Model as Small as Possible in Statistical Machine Translation

被引:0
|
作者
Liu, Yang [1 ]
Zhang, Jiajun [1 ]
Hao, Jie [2 ]
Zhang, Dakun [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Toshiba China R&D Ctr, Beijing, Peoples R China
来源
关键词
language model pruning; frequent n-gram clustering; statistical machine translation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the key components, n-gram language model is most frequently used in statistical machine translation. Typically, higher order of the language model leads to better translation performance. However, higher order of the n-gram language model requires much more monolingual training data to avoid data sparseness. Furthermore, the model size increases exponentially when the n-gram order becomes higher and higher. In this paper, we investigate the language model pruning techniques that aim at making the model size as small as possible while keeping the translation quality. According to our investigation, we further propose to replace the higher order n-grams with a low-order cluster-based language model. The extensive experiments show that our method is very effective.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [41] Morphology generation for English-Indian language statistical machine translation
    S. Sreelekha
    Soft Computing, 2021, 25 : 3657 - 3664
  • [42] Backward and trigger-based language models for statistical machine translation
    Xiong, Deyi
    Zhang, Min
    NATURAL LANGUAGE ENGINEERING, 2015, 21 (02) : 201 - 226
  • [43] An Approach to N-Gram Language Model Evaluation in Phrase-Based Statistical Machine Translation
    Su, Jinsong
    Liu, Qun
    Dong, Huailin
    Chen, Yidong
    Shi, Xiaodong
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 201 - 204
  • [44] Topic-aware pivot language approach for statistical machine translation
    Jin-song SU
    Xiao-dong SHI
    Yan-zhou HUANG
    Yang LIU
    Qing-qiang WU
    Yi-dong CHEN
    Huai-lin DONG
    Frontiers of Information Technology & Electronic Engineering, 2014, (04) : 241 - 253
  • [45] Morphology generation for English-Indian language statistical machine translation
    Sreelekha, S.
    SOFT COMPUTING, 2021, 25 (05) : 3657 - 3664
  • [46] A Language Acquisition Method Based on Statistical Machine Translation for Application to Robots
    Takabuchi, Kenta
    Iwahashi, Naoto
    Kunishima, Takeo
    2016 JOINT IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2016, : 300 - 301
  • [47] Pivot language approach for phrase-based statistical machine translation
    Wu, Hua
    Wang, Haifeng
    MACHINE TRANSLATION, 2007, 21 (03) : 165 - 181
  • [48] Morphology in Statistical Machine Translation from English to a Highly Inflectional Language
    Maucec, Mirjam S.
    Donaj, Gregor
    INFORMATION TECHNOLOGY AND CONTROL, 2018, 47 (01): : 63 - 74
  • [49] Neural Machine Translation Advised by Statistical Machine Translation
    Wang, Xing
    Lu, Zhengdong
    Tu, Zhaopeng
    Li, Hang
    Xiong, Deyi
    Zhang, Min
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3330 - 3336
  • [50] Continuous Model Improvement for Language Understanding with Machine Translation
    Abujabal, Abdalghani
    Bovi, Claudio Delli
    Ryu, Sungho
    Gojayev, Turan
    Versley, Yannick
    Triefenbach, Fabian
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 56 - 62