Attention over Heads: A Multi-Hop Attention for Neural Machine Translation

被引：0

作者：

Iida, Shohei ^{[1
]}

Kimura, Ryuichiro ^{[1
]}

Cui, Hongyi ^{[1
]}

Hung, Po-Hsuan ^{[1
]}

Utsuro, Takehito ^{[1
]}

Nagata, Masaaki ^{[2
]}

机构：

[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki, Japan

[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan

来源：

57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a multi-hop attention for the Transformer. It refines the attention for an output symbol by integrating that of each head, and consists of two hops. The first hop attention is the scaled dot-product attention which is the same attention mechanism used in the original Transformer. The second hop attention is a combination of multi-layer perceptron (MLP) attention and head gate, which efficiently increases the complexity of the model by adding dependencies between heads. We demonstrate that the translation accuracy of the proposed multi-hop attention outperforms the baseline Transformer significantly, +0.85 BLEU point for the IWSLT-2017 German-to-English task and +2.58 BLEU point for the WMT-2017 German-to-English task. We also find that the number of parameters required for a multi-hop attention is smaller than that for stacking another self-attention layer and the proposed model converges significantly faster than the original Transformer.

引用

页码：217 / 222

页数：6

共 50 条

[31] MAGCDA: A Multi-Hop Attention Graph Neural Networks Method for CircRNA-Disease Association Prediction
Wang, Lei
Li, Zheng-Wei
You, Zhu-Hong
Huang, De-Shuang
Wong, Leon
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (03) : 1752 - 1761
[32] Attention With Sparsity Regularization for Neural Machine Translation and Summarization
Zhang, Jiajun
Zhao, Yang
Li, Haoran
Zong, Chengqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 507 - 518
[33] Neural Machine Translation with Target-Attention Model
Yang, Mingming
Zhang, Min
Chen, Kehai
Wang, Rui
Zhao, Tiejun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (03) : 684 - 694
[34] Syntax-Directed Attention for Neural Machine Translation
Chen, Kehai
Wang, Rui
Utiyama, Masao
Sumita, Eiichiro
Zhao, Tiejun
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4792 - 4799
[35] Dynamic Attention Aggregation with BERT for Neural Machine Translation
Zhang, JiaRui
Li, HongZheng
Shi, ShuMin
Huang, HeYan
Hu, Yue
Wei, XiangPeng
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[36] Synchronous Syntactic Attention for Transformer Neural Machine Translation
Deguchi, Hiroyuki
Tamura, Akihiro
Ninomiya, Takashi
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 348 - 355
[37] Attention based English to Punjabi neural machine translation
Singh, Shivkaran
Kumar, M. Anand
Soman, K. P.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1551 - 1559
[38] Simultaneous neural machine translation with a reinforced attention mechanism
Lee, YoHan
Shin, JongHun
Kim, YoungKil
ETRI JOURNAL, 2021, 43 (05) : 775 - 786
[39] Measuring and Improving Faithfulness of Attention in Neural Machine Translation
Moradi, Pooya
Kambhatla, Nishant
Sarkar, Anoop
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2791 - 2802
[40] ConvHiA: convolutional network with hierarchical attention for knowledge graph multi-hop reasoning
Li, Dengao
Miao, Shuyi
Zhao, Baofeng
Zhou, Yu
Feng, Ding
Zhao, Jumin
Niu, Xupeng
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (07) : 2301 - 2315

← 1 2 3 4 5 →