Training Deeper Neural Machine Translation Models with Transparent Attention

被引：0

作者：

Bapna, Ankur ^{[1
]}

Chen, Mia Xu ^{[1
]}

Firat, Orhan ^{[1
]}

Cao, Yuan ^{[1
]}

Wu, Yonghui ^{[1
]}

机构：

[1] Google AI, Mountain View, CA 94043 USA

来源：

2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and BiRNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.

引用

页码：3028 / 3033

页数：6

共 50 条

[21] Attention With Sparsity Regularization for Neural Machine Translation and Summarization
Zhang, Jiajun
Zhao, Yang
Li, Haoran
Zong, Chengqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 507 - 518
[22] Neural Machine Translation with Target-Attention Model
Yang, Mingming
Zhang, Min
Chen, Kehai
Wang, Rui
Zhao, Tiejun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (03) : 684 - 694
[23] Syntax-Directed Attention for Neural Machine Translation
Chen, Kehai
Wang, Rui
Utiyama, Masao
Sumita, Eiichiro
Zhao, Tiejun
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4792 - 4799
[24] Dynamic Attention Aggregation with BERT for Neural Machine Translation
Zhang, JiaRui
Li, HongZheng
Shi, ShuMin
Huang, HeYan
Hu, Yue
Wei, XiangPeng
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[25] Synchronous Syntactic Attention for Transformer Neural Machine Translation
Deguchi, Hiroyuki
Tamura, Akihiro
Ninomiya, Takashi
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 348 - 355
[26] Attention based English to Punjabi neural machine translation
Singh, Shivkaran
Kumar, M. Anand
Soman, K. P.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1551 - 1559
[27] Simultaneous neural machine translation with a reinforced attention mechanism
Lee, YoHan
Shin, JongHun
Kim, YoungKil
ETRI JOURNAL, 2021, 43 (05) : 775 - 786
[28] Measuring and Improving Faithfulness of Attention in Neural Machine Translation
Moradi, Pooya
Kambhatla, Nishant
Sarkar, Anoop
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2791 - 2802
[29] Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Gordon, Mitchell A.
Duh, Kevin
NEURAL GENERATION AND TRANSLATION, 2020, : 110 - 118
[30] Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?
Kim, Zae Myung
Besacier, Laurent
Nikoulina, Vassilina
Schwab, Didier
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2832 - 2841

← 1 2 3 4 5 →