Integrating source-language context into phrase-based statistical machine translation

被引:7
|
作者
Haque, Rejwanul [1 ]
Naskar, Sudip Kumar [1 ]
van den Bosch, Antal [2 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, CNGL, Dublin 9, Ireland
[2] Tilburg Univ, ILK Res Grp, Tilburg Ctr Cognit & Commun, Tilburg, Netherlands
基金
爱尔兰科学基金会;
关键词
Statistical machine translation; Phrase-based statistical machine translation; Syntax in machine translation; Translation modelling; Word alignment; Memory-based classification;
D O I
10.1007/s10590-011-9100-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.
引用
收藏
页码:239 / 285
页数:47
相关论文
共 50 条
  • [31] An Approach to N-Gram Language Model Evaluation in Phrase-Based Statistical Machine Translation
    Su, Jinsong
    Liu, Qun
    Dong, Huailin
    Chen, Yidong
    Shi, Xiaodong
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 201 - 204
  • [32] Improving Phrase-Based Statistical Machine Translation Models by Incorporating Syntax-Based Language Models
    陈毅东
    史晓东
    Journal of Donghua University(English Edition), 2010, 27 (02) : 185 - 188
  • [33] Improving Phrase-based Korean-English Statistical Machine Translation
    Lee, Jonghoon
    Lee, Donghyeon
    Lee, Gary Geunbae
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 753 - 756
  • [34] Using TectoMT as a Preprocessing Tool for Phrase-Based Statistical Machine Translation
    Zeman, Daniel
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 216 - 223
  • [35] Translation paraphrases in phrase-based machine translation
    Guzman, Francisco
    Garrido, Leonardo
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 388 - 398
  • [36] Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation
    Tinsley, John
    Hearne, Mary
    Way, Andy
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2009, 5449 : 318 - 331
  • [37] Linguistic Resources for Factored Phrase-Based Statistical Machine Translation Systems
    Navlea, Mirabela
    Todirascu, Amalia
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : H41 - H48
  • [38] Learning Word Reorderings for Hierarchical Phrase-based Statistical Machine Translation
    Zhang, Jingyi
    Utiyama, Masao
    Sumita, Eiichro
    Zhao, Hai
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 542 - 548
  • [39] Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation System
    Sebastian, Mary Priya
    Kumar, G. Santhosh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [40] Parts of Speech Tagged Phrase-Based Statistical Machine Translation System for English → Mizo Language
    Devi C.S.
    Roy A.K.
    Purkayastha B.S.
    SN Computer Science, 4 (6)