Attention over Heads: A Multi-Hop Attention for Neural Machine Translation

被引:0
|
作者
Iida, Shohei [1 ]
Kimura, Ryuichiro [1 ]
Cui, Hongyi [1 ]
Hung, Po-Hsuan [1 ]
Utsuro, Takehito [1 ]
Nagata, Masaaki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a multi-hop attention for the Transformer. It refines the attention for an output symbol by integrating that of each head, and consists of two hops. The first hop attention is the scaled dot-product attention which is the same attention mechanism used in the original Transformer. The second hop attention is a combination of multi-layer perceptron (MLP) attention and head gate, which efficiently increases the complexity of the model by adding dependencies between heads. We demonstrate that the translation accuracy of the proposed multi-hop attention outperforms the baseline Transformer significantly, +0.85 BLEU point for the IWSLT-2017 German-to-English task and +2.58 BLEU point for the WMT-2017 German-to-English task. We also find that the number of parameters required for a multi-hop attention is smaller than that for stacking another self-attention layer and the proposed model converges significantly faster than the original Transformer.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [31] MAGCDA: A Multi-Hop Attention Graph Neural Networks Method for CircRNA-Disease Association Prediction
    Wang, Lei
    Li, Zheng-Wei
    You, Zhu-Hong
    Huang, De-Shuang
    Wong, Leon
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (03) : 1752 - 1761
  • [32] Attention With Sparsity Regularization for Neural Machine Translation and Summarization
    Zhang, Jiajun
    Zhao, Yang
    Li, Haoran
    Zong, Chengqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 507 - 518
  • [33] Neural Machine Translation with Target-Attention Model
    Yang, Mingming
    Zhang, Min
    Chen, Kehai
    Wang, Rui
    Zhao, Tiejun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (03) : 684 - 694
  • [34] Syntax-Directed Attention for Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4792 - 4799
  • [35] Dynamic Attention Aggregation with BERT for Neural Machine Translation
    Zhang, JiaRui
    Li, HongZheng
    Shi, ShuMin
    Huang, HeYan
    Hu, Yue
    Wei, XiangPeng
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [36] Synchronous Syntactic Attention for Transformer Neural Machine Translation
    Deguchi, Hiroyuki
    Tamura, Akihiro
    Ninomiya, Takashi
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 348 - 355
  • [37] Attention based English to Punjabi neural machine translation
    Singh, Shivkaran
    Kumar, M. Anand
    Soman, K. P.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1551 - 1559
  • [38] Simultaneous neural machine translation with a reinforced attention mechanism
    Lee, YoHan
    Shin, JongHun
    Kim, YoungKil
    ETRI JOURNAL, 2021, 43 (05) : 775 - 786
  • [39] Measuring and Improving Faithfulness of Attention in Neural Machine Translation
    Moradi, Pooya
    Kambhatla, Nishant
    Sarkar, Anoop
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2791 - 2802
  • [40] ConvHiA: convolutional network with hierarchical attention for knowledge graph multi-hop reasoning
    Li, Dengao
    Miao, Shuyi
    Zhao, Baofeng
    Zhou, Yu
    Feng, Ding
    Zhao, Jumin
    Niu, Xupeng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (07) : 2301 - 2315