Training Deeper Neural Machine Translation Models with Transparent Attention

被引:0
|
作者
Bapna, Ankur [1 ]
Chen, Mia Xu [1 ]
Firat, Orhan [1 ]
Cao, Yuan [1 ]
Wu, Yonghui [1 ]
机构
[1] Google AI, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and BiRNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.
引用
收藏
页码:3028 / 3033
页数:6
相关论文
共 50 条
  • [21] Attention With Sparsity Regularization for Neural Machine Translation and Summarization
    Zhang, Jiajun
    Zhao, Yang
    Li, Haoran
    Zong, Chengqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 507 - 518
  • [22] Neural Machine Translation with Target-Attention Model
    Yang, Mingming
    Zhang, Min
    Chen, Kehai
    Wang, Rui
    Zhao, Tiejun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (03) : 684 - 694
  • [23] Syntax-Directed Attention for Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4792 - 4799
  • [24] Dynamic Attention Aggregation with BERT for Neural Machine Translation
    Zhang, JiaRui
    Li, HongZheng
    Shi, ShuMin
    Huang, HeYan
    Hu, Yue
    Wei, XiangPeng
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [25] Synchronous Syntactic Attention for Transformer Neural Machine Translation
    Deguchi, Hiroyuki
    Tamura, Akihiro
    Ninomiya, Takashi
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 348 - 355
  • [26] Attention based English to Punjabi neural machine translation
    Singh, Shivkaran
    Kumar, M. Anand
    Soman, K. P.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1551 - 1559
  • [27] Simultaneous neural machine translation with a reinforced attention mechanism
    Lee, YoHan
    Shin, JongHun
    Kim, YoungKil
    ETRI JOURNAL, 2021, 43 (05) : 775 - 786
  • [28] Measuring and Improving Faithfulness of Attention in Neural Machine Translation
    Moradi, Pooya
    Kambhatla, Nishant
    Sarkar, Anoop
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2791 - 2802
  • [29] Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
    Gordon, Mitchell A.
    Duh, Kevin
    NEURAL GENERATION AND TRANSLATION, 2020, : 110 - 118
  • [30] Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?
    Kim, Zae Myung
    Besacier, Laurent
    Nikoulina, Vassilina
    Schwab, Didier
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2832 - 2841