Training Deeper Neural Machine Translation Models with Transparent Attention

被引：0

作者：

Bapna, Ankur ^{[1
]}

Chen, Mia Xu ^{[1
]}

Firat, Orhan ^{[1
]}

Cao, Yuan ^{[1
]}

Wu, Yonghui ^{[1
]}

机构：

[1] Google AI, Mountain View, CA 94043 USA

来源：

2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and BiRNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.

引用

页码：3028 / 3033

页数：6

共 50 条

[31] The Unreasonable Volatility of Neural Machine Translation Models
Fadaee, Marzieh
Monz, Christof
NEURAL GENERATION AND TRANSLATION, 2020, : 88 - 96
[32] Compact Personalized Models for Neural Machine Translation
Wuebker, Joern
Simianer, Patrick
DeNero, John
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 881 - 886
[33] Exploring the Role of Monolingual Data in Cross-Attention Pre-training for Neural Machine Translation
Khang Pham
Long Nguyen
Dien Dinh
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2023, 2023, 14162 : 179 - 190
[34] Attention over Heads: A Multi-Hop Attention for Neural Machine Translation
Iida, Shohei
Kimura, Ryuichiro
Cui, Hongyi
Hung, Po-Hsuan
Utsuro, Takehito
Nagata, Masaaki
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 217 - 222
[35] Training Neural Machine Translation To Apply Terminology Constraints
Dinu, Georgiana
Mathur, Prashant
Federico, Marcello
Al-Onaizan, Yaser
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3063 - 3068
[36] Shallow-to-Deep Training for Neural Machine Translation
Li, Bei
Wang, Ziyang
Liu, Hui
Jiang, Yufan
Du, Quan
Xiao, Tong
Wang, Huizhen
Zhu, Jingbo
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 995 - 1005
[37] Pre-training Methods for Neural Machine Translation
Wang, Mingxuan
Li, Lei
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: TUTORIAL ABSTRACTS, 2021, : 21 - 25
[38] Restricted or Not: A General Training Framework for Neural Machine Translation
Li, Zuchao
Utiyama, Masao
Sumita, Eiichiro
Zhao, Hai
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 245 - 251
[39] Look Harder: A Neural Machine Translation Model with Hard Attention
Indurthi, Sathish
Chung, Insoo
Kim, Sangha
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3037 - 3043
[40] Recursive Annotations for Attention-Based Neural Machine Translation
Ye, Shaolin
Guo, Wu
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 164 - 167

← 1 2 3 4 5 →