Training Deeper Neural Machine Translation Models with Transparent Attention

被引:0
|
作者
Bapna, Ankur [1 ]
Chen, Mia Xu [1 ]
Firat, Orhan [1 ]
Cao, Yuan [1 ]
Wu, Yonghui [1 ]
机构
[1] Google AI, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and BiRNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.
引用
收藏
页码:3028 / 3033
页数:6
相关论文
共 50 条
  • [31] The Unreasonable Volatility of Neural Machine Translation Models
    Fadaee, Marzieh
    Monz, Christof
    NEURAL GENERATION AND TRANSLATION, 2020, : 88 - 96
  • [32] Compact Personalized Models for Neural Machine Translation
    Wuebker, Joern
    Simianer, Patrick
    DeNero, John
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 881 - 886
  • [33] Exploring the Role of Monolingual Data in Cross-Attention Pre-training for Neural Machine Translation
    Khang Pham
    Long Nguyen
    Dien Dinh
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2023, 2023, 14162 : 179 - 190
  • [34] Attention over Heads: A Multi-Hop Attention for Neural Machine Translation
    Iida, Shohei
    Kimura, Ryuichiro
    Cui, Hongyi
    Hung, Po-Hsuan
    Utsuro, Takehito
    Nagata, Masaaki
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 217 - 222
  • [35] Training Neural Machine Translation To Apply Terminology Constraints
    Dinu, Georgiana
    Mathur, Prashant
    Federico, Marcello
    Al-Onaizan, Yaser
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3063 - 3068
  • [36] Shallow-to-Deep Training for Neural Machine Translation
    Li, Bei
    Wang, Ziyang
    Liu, Hui
    Jiang, Yufan
    Du, Quan
    Xiao, Tong
    Wang, Huizhen
    Zhu, Jingbo
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 995 - 1005
  • [37] Pre-training Methods for Neural Machine Translation
    Wang, Mingxuan
    Li, Lei
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: TUTORIAL ABSTRACTS, 2021, : 21 - 25
  • [38] Restricted or Not: A General Training Framework for Neural Machine Translation
    Li, Zuchao
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Hai
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 245 - 251
  • [39] Look Harder: A Neural Machine Translation Model with Hard Attention
    Indurthi, Sathish
    Chung, Insoo
    Kim, Sangha
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3037 - 3043
  • [40] Recursive Annotations for Attention-Based Neural Machine Translation
    Ye, Shaolin
    Guo, Wu
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 164 - 167