Multi-Source Neural Machine Translation With Missing Data

被引:13
|
作者
Nishimura, Yuta [1 ]
Sudoh, Katsuhito [1 ]
Neubig, Graham [2 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Ikoma 6300192, Japan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Neural machine translation (NMT); multi-linguality; data augmentation;
D O I
10.1109/TASLP.2019.2959224
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Machine translation is rife with ambiguities in word ordering and word choice, and even with the advent of machine-learning methods that learn to resolve this ambiguity based on statistics from large corpora, mistakes are frequent. Multi-source translation is an approach that attempts to resolve these ambiguities by exploiting multiple inputs (e.g. sentences in three different languages) to increase translation accuracy. These methods are trained on multilingual corpora, which include the multiple source languages and the target language, and then at test time uses information from both source languages while generating the target. While there are many of these multilingual corpora, such as multilingual translations of TED talks or European parliament proceedings, in practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages. Existing studies on multi-source translation did not explicitly handle such situations, and thus are only applicable to complete corpora that have all of the languages of interest, severely limiting their practical applicability. In this article, we examine approaches for multi-source neural machine translation (NMT) that can learn from and translate such incomplete corpora. Specifically, we propose methods to deal with incomplete corpora at both training time and test time. For training time, we examine two methods: (1) a simple method that simply replaces missing source translations with a special NULL symbol, and (2) a data augmentation approach that fills in incomplete parts with source translations created from multi-source NMT. For test-time, we examine methods that use multi-source translation even when only a single source is provided by first translating into an additional auxiliary language using standard NMT, then using multi-source translation on the original source and this generated auxiliary language sentence. Extensive experiments demonstrate that the proposed training-time and test-time methods both significantly improve translation performance.
引用
收藏
页码:569 / 580
页数:12
相关论文
共 50 条
  • [11] Feature Importance and Predictive Modeling for Multi-source Healthcare Data with Missing Values
    Srinivasan, Karthik
    Currim, Faiz
    Ram, Sudha
    Lindberg, Casey
    Sternberg, Esther
    Skeath, Perry
    Najafi, Bijan
    Razjouyan, Javad
    Lee, Hyo-Ki
    Foe-Parker, Colin
    Goebel, Nicole
    Herzl, Reuben
    Mehl, Matthias R.
    Gilligan, Brian
    Heerwagen, Judith
    Kampschroer, Kevin
    Canada, Kelli
    DH'16: PROCEEDINGS OF THE 2016 DIGITAL HEALTH CONFERENCE, 2016, : 47 - 54
  • [12] Multi-source Data Clustering
    Li, Tiancheng
    Corchado, Juan M.
    Bajo, Javier
    Sun, Shudong
    2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 830 - 837
  • [13] Mapping Himalayan leucogranites by machine learning using multi-source data
    Wang Z.
    Zuo R.
    Earth Science Frontiers, 2023, 30 (05) : 216 - 226
  • [14] Integrating Multi-source Bilingual Information for Chinese Word Segmentation in Statistical Machine Translation
    Chen, Wei
    Wei, Wei
    Chen, Zhenbiao
    Xu, Bo
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 61 - 72
  • [15] Multi-Source Neural Variational Inference
    Kurle, Richard
    Guennemann, Stephan
    van der Smagt, Patrick
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4114 - 4121
  • [16] Neural Multi-Source Morphological Reinflection
    Kann, Katharina
    Cotterell, Ryan
    Schuetze, Hinrich
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 514 - 524
  • [17] A Multi-Source Convolutional Neural Network for Lidar Bathymetry Data Classification
    Zhao, Yiqiang
    Yu, Xuemin
    Hu, Bin
    Chen, Rui
    MARINE GEODESY, 2022, 45 (03) : 232 - 250
  • [18] A New Way of Handling Missing Data in Multi-source Classification Based on Adaptive Imputation
    Abdelkhalek, Ikram
    Ben Brahim, Afef
    Essousi, Nadia
    MODEL AND DATA ENGINEERING, MEDI 2018, 2018, 11163 : 125 - 136
  • [19] Prediction of the Remaining Useful Life of a Switch Machine, Based on Multi-Source Data
    Zheng, Yunshui
    Chen, Weimin
    Zhang, Yaning
    Bai, Dengyu
    SUSTAINABILITY, 2022, 14 (21)
  • [20] A Machine Learning Approach for Convective Initiation Detection Using Multi-source Data
    Liu, Xuan
    Chen, Haonan
    Han, Lei
    Ge, Yurong
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 6518 - 6521