Attention over Heads: A Multi-Hop Attention for Neural Machine Translation

被引:0
|
作者
Iida, Shohei [1 ]
Kimura, Ryuichiro [1 ]
Cui, Hongyi [1 ]
Hung, Po-Hsuan [1 ]
Utsuro, Takehito [1 ]
Nagata, Masaaki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a multi-hop attention for the Transformer. It refines the attention for an output symbol by integrating that of each head, and consists of two hops. The first hop attention is the scaled dot-product attention which is the same attention mechanism used in the original Transformer. The second hop attention is a combination of multi-layer perceptron (MLP) attention and head gate, which efficiently increases the complexity of the model by adding dependencies between heads. We demonstrate that the translation accuracy of the proposed multi-hop attention outperforms the baseline Transformer significantly, +0.85 BLEU point for the IWSLT-2017 German-to-English task and +2.58 BLEU point for the WMT-2017 German-to-English task. We also find that the number of parameters required for a multi-hop attention is smaller than that for stacking another self-attention layer and the proposed model converges significantly faster than the original Transformer.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [21] Bilingual attention based neural machine translation
    Kang, Liyan
    He, Shaojie
    Wang, Mingxuan
    Long, Fei
    Su, Jinsong
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4302 - 4315
  • [22] Edge-featured multi-hop attention graph neural network for intrusion detection system
    Deng, Ping
    Huang, Yong
    COMPUTERS & SECURITY, 2025, 148
  • [23] Bilingual attention based neural machine translation
    Liyan Kang
    Shaojie He
    Mingxuan Wang
    Fei Long
    Jinsong Su
    Applied Intelligence, 2023, 53 : 4302 - 4315
  • [24] Attention Calibration for Transformer in Neural Machine Translation
    Lu, Yu
    Zeng, Jiali
    Zhang, Jiajun
    Wu, Shuangzhi
    Li, Mu
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1288 - 1298
  • [25] Parallel Attention Mechanisms in Neural Machine Translation
    Medina, Julian Richard
    Kalita, Jugal
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 547 - 552
  • [26] Multi-Hop Transformer for Document-Level Machine Translation
    Zhang, Long
    Zhang, Tong
    Zhang, Haibo
    Yang, Baosong
    Ye, Wei
    Zhang, Shikun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3953 - 3963
  • [27] Multi-Head Attention for End-to-End Neural Machine Translation
    Fung, Ivan
    Mak, Brian
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 250 - 254
  • [28] Multi-hop interactive attention based classification network for expert recommendation
    Qian, Lingfei
    Wang, Jian
    Lin, Hongfei
    Yang, Liang
    Zhang, Yu
    NEUROCOMPUTING, 2022, 488 : 436 - 443
  • [29] DMGAN: Dynamic Multi-Hop Graph Attention Network for Traffic Forecasting
    Li, Rui
    Zhang, Fan
    Li, Tong
    Zhang, Ning
    Zhang, Tingting
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (09) : 9088 - 9101
  • [30] Diffuser: Efficient Transformers with Multi-Hop Attention Diffusion for Long Sequences
    Feng, Aosong
    Li, Irene
    Jiang, Yuang
    Ying, Rex
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12772 - 12780