Integrating source-language context into phrase-based statistical machine translation

被引:7
|
作者
Haque, Rejwanul [1 ]
Naskar, Sudip Kumar [1 ]
van den Bosch, Antal [2 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, CNGL, Dublin 9, Ireland
[2] Tilburg Univ, ILK Res Grp, Tilburg Ctr Cognit & Commun, Tilburg, Netherlands
基金
爱尔兰科学基金会;
关键词
Statistical machine translation; Phrase-based statistical machine translation; Syntax in machine translation; Translation modelling; Word alignment; Memory-based classification;
D O I
10.1007/s10590-011-9100-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.
引用
收藏
页码:239 / 285
页数:47
相关论文
共 50 条
  • [21] Phrase-based statistical machine translation using approximate matching
    Tomas, Jesus
    Lloret, Jaime
    Casacuberta, Francisco
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2007, 4477 : 475 - +
  • [22] Slavic languages in phrase-based statistical machine translation: a survey
    Maucec, Mirjam Sepesy
    Brest, Janez
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 51 (01) : 77 - 117
  • [23] Modality-Preserving Phrase-Based Statistical Machine Translation
    Ideue, Masamichi
    Yamamoto, Kazuhide
    Utiyama, Masao
    Sumita, Eiichiro
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 129 - 132
  • [24] Improving Phrase-Based Statistical Machine Translation with Preprocessing Techniques
    Yashothara, S.
    Uthayasanker, R. T.
    Jayasena, S.
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 322 - 327
  • [25] Improving phrase-based statistical machine translation with morphosyntactic transformation
    Thai Phuong Nguyen
    Shimazu, Akira
    MACHINE TRANSLATION, 2006, 20 (03) : 147 - 166
  • [26] A phrase-based, joint probability model for statistical machine translation
    Marcu, D
    Wong, W
    PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2002, : 133 - 139
  • [27] Phrase-Based Tibetan-Chinese Statistical Machine Translation
    Yong Cuo
    Shi, Xiaodong
    Nyima, Tashi
    Chen, Yidong
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 424 - 427
  • [28] Statistical phrase-based speech translation
    Mathias, Lambert
    Byrne, William
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 561 - 564
  • [29] Empirical Analysis of Phrase-Based Statistical Machine Translation System for English to Hindi Language
    Babhulgaonkar, Arun
    Sonavane, Shefali
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (02) : 135 - 162
  • [30] Building a bilingual lexicon using phrase-based statistical machine translation via a pivot language
    Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
    不详
    Coling - Int. Conf. Comput. Linguist., Proc. Conf., 1600, (127-130):