Training Deeper Neural Machine Translation Models with Transparent Attention

被引：0

作者：

Bapna, Ankur ^{[1
]}

Chen, Mia Xu ^{[1
]}

Firat, Orhan ^{[1
]}

Cao, Yuan ^{[1
]}

Wu, Yonghui ^{[1
]}

机构：

[1] Google AI, Mountain View, CA 94043 USA

来源：

2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and BiRNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.

引用

页码：3028 / 3033

页数：6

共 50 条

[1] "Found in Translation": A deeper analysis of neural machine translation models for chemical reaction prediction
Schwaller, Philippe
Gaudin, Theophile
Lanyi, David
Bekas, Costas
Laino, Teodoro
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
[2] Training with Adversaries to Improve Faithfulness of Attention in Neural Machine Translation
Moradi, Pooya
Kambhatla, Nishant
Sarkar, Anoop
AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 86 - 93
[3] Effectively training neural machine translation models with monolingual data
Yang, Zhen
Chen, Wei
Wang, Feng
Xu, Bo
NEUROCOMPUTING, 2019, 333 : 240 - 247
[4] Joint Training for Neural Machine Translation Models with Monolingual Data
Zhang, Zhirui
Liu, Shujie
Li, Mu
Zhou, Ming
Chen, Enhong
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 555 - 562
[5] Neural Machine Translation Models with Attention-Based Dropout Layer
Israr, Huma
Khan, Safdar Abbas
Tahir, Muhammad Ali
Shahzad, Muhammad Khuram
Ahmad, Muneer
Zain, Jasni Mohamad
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 2981 - 3009
[6] Recurrent Attention for Neural Machine Translation
Zeng, Jiali
Wu, Shuangzhi
Yin, Yongjing
Jiang, Yufan
Li, Mu
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3216 - 3225
[7] Neural Machine Translation with Deep Attention
Zhang, Biao
Xiong, Deyi
Su, Jinsong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 154 - 163
[8] Attention-via-Attention Neural Machine Translation
Zhao, Shenjian
Zhang, Zhihua
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 563 - 570
[9] Sparse and Constrained Attention for Neural Machine Translation
Malaviya, Chaitanya
Ferreira, Pedro
Martins, Andre F. T.
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 370 - 376
[10] Bilingual attention based neural machine translation
Kang, Liyan
He, Shaojie
Wang, Mingxuan
Long, Fei
Su, Jinsong
APPLIED INTELLIGENCE, 2023, 53 (04) : 4302 - 4315

← 1 2 3 4 5 →