Multi-Source Neural Machine Translation With Missing Data

被引:13
|
作者
Nishimura, Yuta [1 ]
Sudoh, Katsuhito [1 ]
Neubig, Graham [2 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Ikoma 6300192, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Neural machine translation (NMT); multi-linguality; data augmentation;
D O I
10.1109/TASLP.2019.2959224
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Machine translation is rife with ambiguities in word ordering and word choice, and even with the advent of machine-learning methods that learn to resolve this ambiguity based on statistics from large corpora, mistakes are frequent. Multi-source translation is an approach that attempts to resolve these ambiguities by exploiting multiple inputs (e.g. sentences in three different languages) to increase translation accuracy. These methods are trained on multilingual corpora, which include the multiple source languages and the target language, and then at test time uses information from both source languages while generating the target. While there are many of these multilingual corpora, such as multilingual translations of TED talks or European parliament proceedings, in practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages. Existing studies on multi-source translation did not explicitly handle such situations, and thus are only applicable to complete corpora that have all of the languages of interest, severely limiting their practical applicability. In this article, we examine approaches for multi-source neural machine translation (NMT) that can learn from and translate such incomplete corpora. Specifically, we propose methods to deal with incomplete corpora at both training time and test time. For training time, we examine two methods: (1) a simple method that simply replaces missing source translations with a special NULL symbol, and (2) a data augmentation approach that fills in incomplete parts with source translations created from multi-source NMT. For test-time, we examine methods that use multi-source translation even when only a single source is provided by first translating into an additional auxiliary language using standard NMT, then using multi-source translation on the original source and this generated auxiliary language sentence. Extensive experiments demonstrate that the proposed training-time and test-time methods both significantly improve translation performance.
引用
收藏
页码:569 / 580
页数:12
相关论文
共 50 条
  • [41] Improved Neural Machine Translation with Source Syntax
    Wu, Shuangzhi
    Zhou, Ming
    Zhang, Dongdong
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4179 - 4185
  • [42] Modeling Source Syntax for Neural Machine Translation
    Li, Junhui
    Xiong, Deyi
    Tu, Zhaopeng
    Zhu, Muhua
    Zhang, Min
    Zhou, Guodong
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 688 - 697
  • [43] Source Segment Encoding for Neural Machine Translation
    Wang, Qiang
    Xiao, Tong
    Zhu, Jingbo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, 2018, 11108 : 329 - 340
  • [44] Multi-Source Ensemble Learning for the Remote Prediction of Parkinson's Disease in the Presence of Source-Wise Missing Data
    Prince, John
    Andreotti, Fernando
    De Vos, Maarten
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2019, 66 (05) : 1402 - 1411
  • [45] Multi-source Data Collection Data Security Analysis
    Ma, Lei
    Li, Yunwei
    ADVANCED HYBRID INFORMATION PROCESSING, ADHIP 2022, PT II, 2023, 469 : 458 - 472
  • [46] Efficient multi-source data transfer in Data Grids
    Wang, Chien-Min
    Hsu, Chun-Chen
    Chen, Hsi-Min
    Wu, Jan-Jan
    SIXTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID: SPANNING THE WORLD AND BEYOND, 2006, : 421 - +
  • [47] Multi-source data fusion for economic data analysis
    Li, Menggang
    Wang, Fang
    Jia, Xiaojun
    Li, Wenrui
    Li, Ting
    Rui, Guangwei
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 4729 - 4739
  • [48] Improving Winter Wheat Yield Forecasting Based on Multi-Source Data and Machine Learning
    Sun, Yuexia
    Zhang, Shuai
    Tao, Fulu
    Aboelenein, Rashad
    Amer, Alia
    AGRICULTURE-BASEL, 2022, 12 (05):
  • [49] Discussion of "Measuring Housing Vitality from Multi-Source Big Data and Machine Learning"
    Banerjee, Sudipto
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1063 - 1065
  • [50] Comments on "Measuring Housing Vitality from Multi-Source Big Data and Machine Learning"
    Tu, Wei
    Jiang, Bei
    Kong, Linglong
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1060 - 1062