Integrating source-language context into phrase-based statistical machine translation

被引:7
|
作者
Haque, Rejwanul [1 ]
Naskar, Sudip Kumar [1 ]
van den Bosch, Antal [2 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, CNGL, Dublin 9, Ireland
[2] Tilburg Univ, ILK Res Grp, Tilburg Ctr Cognit & Commun, Tilburg, Netherlands
基金
爱尔兰科学基金会;
关键词
Statistical machine translation; Phrase-based statistical machine translation; Syntax in machine translation; Translation modelling; Word alignment; Memory-based classification;
D O I
10.1007/s10590-011-9100-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.
引用
收藏
页码:239 / 285
页数:47
相关论文
共 50 条
  • [1] Integrating Source-Side Semantic Roles into a Phrase-Based Statistical Machine Translation
    Zavareh, Mahnaz Namazi
    Khadivi, Shahram
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 2140 - 2146
  • [2] Phrase-based statistical machine translation
    Zens, R
    Och, FJ
    Ney, H
    KI2002: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, 2479 : 18 - 32
  • [3] Pivot language approach for phrase-based statistical machine translation
    Wu, Hua
    Wang, Haifeng
    MACHINE TRANSLATION, 2007, 21 (03) : 165 - 181
  • [4] Phrase table filtration based on virtual context in phrase-based statistical machine translation
    Yin, Yue
    Zhang, Yu Jie
    Xu, Jin An
    INFORMATION TECHNOLOGY AND COMPUTER APPLICATION ENGINEERING, 2014, : 327 - 330
  • [5] Improvements in phrase-based statistical machine translation
    Zens, R
    Ney, H
    HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 257 - 264
  • [6] FACTORED PHRASE-BASED STATISTICAL MACHINE TRANSLATION
    Tufis, Dan
    Ceausu, Alexandru
    FROM SPEECH PROCESSING TO SPOKEN LANGUAGE TECHNOLOGY, 2009, : 115 - 124
  • [7] Syntactic phrase-based statistical machine translation
    Hassan, Hany
    Heame, Mary
    Way, Andy
    Sima'an, Khalil
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 238 - +
  • [8] A unified framework and models for integrating translation memory into phrase-based statistical machine translation
    Liu, Yang
    Wang, Kun
    Zong, Chengqing
    Su, Keh-Yih
    COMPUTER SPEECH AND LANGUAGE, 2019, 54 : 176 - 206
  • [9] Comparing and integrating alignment template and standard phrase-based statistical machine translation
    Xu, Lin
    Cao, Xiaoguang
    Zhang, Bufeng
    Li, Mu
    Computational Linguistics and Intelligent Text Processing, 2007, 4394 : 420 - 431
  • [10] Some improvements in phrase-based statistical machine translation
    Yang, Zhendong
    Pang, Wei
    Du, Jinhua
    Wei, Wei
    Xu, Bo
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 704 - +