Integrating source-language context into phrase-based statistical machine translation

被引:7
|
作者
Haque, Rejwanul [1 ]
Naskar, Sudip Kumar [1 ]
van den Bosch, Antal [2 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, CNGL, Dublin 9, Ireland
[2] Tilburg Univ, ILK Res Grp, Tilburg Ctr Cognit & Commun, Tilburg, Netherlands
基金
爱尔兰科学基金会;
关键词
Statistical machine translation; Phrase-based statistical machine translation; Syntax in machine translation; Translation modelling; Word alignment; Memory-based classification;
D O I
10.1007/s10590-011-9100-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.
引用
收藏
页码:239 / 285
页数:47
相关论文
共 50 条
  • [41] Phrase-Based & Neural Unsupervised Machine Translation
    Lample, Guillaume
    Ott, Myle
    Conneau, Alexis
    Denoyer, Ludovic
    Ranzato, Marc'Aurelio
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 5039 - 5049
  • [42] A reordering model for phrase-based machine translation
    Nguyen, Vinh Van
    Nguyen, Thai Phuong
    Shimazu, Akira
    Nguyen, Minh Le
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2008, 5221 : 476 - +
  • [43] A vector-space dynamic feature for phrase-based statistical machine translation
    Costa-jussa, Marta R.
    Banchs, Rafael E.
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2011, 37 (02) : 139 - 154
  • [44] A unified framework and models for integrating translation memory into phrase-based statistical machine translation (vol 54, pg 176, 2019)
    Liu, Yang
    Wang, Kun
    Zong, Chengqing
    Su, Keh-Yih
    COMPUTER SPEECH AND LANGUAGE, 2019, 55 : 216 - 216
  • [45] A vector-space dynamic feature for phrase-based statistical machine translation
    Marta R. Costa-jussà
    Rafael E. Banchs
    Journal of Intelligent Information Systems, 2011, 37 : 139 - 154
  • [46] Online adaptation to post-edits for phrase-based statistical machine translation
    Bertoldi, Nicola
    Simianer, Patrick
    Cettolo, Mauro
    Waeschle, Katharina
    Federico, Marcello
    Riezler, Stefan
    MACHINE TRANSLATION, 2014, 28 (3-4) : 309 - 339
  • [47] A general framework to deal with the scaling problem in phrase-based statistical machine translation
    Ortiz, Daniel
    Varea, Ismael Garcia
    Casacuberta, Francisco
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 2, PROCEEDINGS, 2007, 4478 : 314 - +
  • [48] Pharaoh: A beam search decoder for phrase-based statistical machine translation models
    Koehn, P
    MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 115 - 124
  • [49] Learning local word reorderings for hierarchical phrase-based statistical machine translation
    Zhang, Jingyi
    Utiyama, Masao
    Sumita, Eiichro
    Zhao, Hai
    Neubig, Graham
    Nakamura, Satoshi
    MACHINE TRANSLATION, 2016, 30 (1-2) : 1 - 18
  • [50] Phrase-Based Machine Translation based on Simulated Annealing
    Lavecchia, Caroline
    Langlois, David
    Smaili, Kamel
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3123 - 3129