Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

被引:0
|
作者
Zhang, Tianfu [1 ,2 ]
Huang, Heyan [1 ,2 ]
Feng, Chong [1 ,3 ]
Cao, Longbing [4 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Key Lab MIIT, Intelligent Informat Proc & Contents Comp, Beijing, Peoples R China
[3] BIT, Southeast Informat Technol Res Inst, Beijing, Peoples R China
[4] Univ Technol Sydney, Ultimo, Australia
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make little contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on WMT14 and WMT16 English -> German and English -> Czech language machine translation validate the RHE effectiveness.
引用
收藏
页码:3238 / 3248
页数:11
相关论文
共 50 条
  • [21] MSIN: An Efficient Multi-head Self-attention Framework for Inertial Navigation
    Shi, Gaotao
    Pan, Bingjia
    Ni, Yuzhi
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT I, 2024, 14487 : 455 - 473
  • [22] Local Multi-Head Channel Self-Attention for Facial Expression Recognition
    Pecoraro, Roberto
    Basile, Valerio
    Bono, Viviana
    INFORMATION, 2022, 13 (09)
  • [23] Convolutional Multi-Head Self-Attention on Memory for Aspect Sentiment Classification
    Yaojie Zhang
    Bing Xu
    Tiejun Zhao
    IEEE/CAAJournalofAutomaticaSinica, 2020, 7 (04) : 1038 - 1044
  • [24] SQL Injection Detection Based on Lightweight Multi-Head Self-Attention
    Lo, Rui-Teng
    Hwang, Wen-Jyi
    Tai, Tsung-Ming
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [25] MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
    Park, Geondo
    Han, Chihye
    Kim, Daeshik
    Yoon, Wonjun
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1507 - 1515
  • [26] Speech enhancement method based on the multi-head self-attention mechanism
    Chang X.
    Zhang Y.
    Yang L.
    Kou J.
    Wang X.
    Xu D.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2020, 47 (01): : 104 - 110
  • [27] Hunt for Unseen Intrusion: Multi-Head Self-Attention Neural Detector
    Seo, Seongyun
    Han, Sungmin
    Park, Janghyeon
    Shim, Shinwoo
    Ryu, Han-Eul
    Cho, Byoungmo
    Lee, Sangkyun
    IEEE ACCESS, 2021, 9 : 129635 - 129647
  • [28] Multi-Head Attention for End-to-End Neural Machine Translation
    Fung, Ivan
    Mak, Brian
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 250 - 254
  • [29] Modality attention fusion model with hybrid multi-head self-attention for video understanding
    Zhuang, Xuqiang
    Liu, Fang'al
    Hou, Jian
    Hao, Jianhua
    Cai, Xiaohong
    PLOS ONE, 2022, 17 (10):
  • [30] A Multi-tab Webpage Fingerprinting Method Based on Multi-head Self-attention
    Xie, Lixia
    Li, Yange
    Yang, Hongyu
    Hu, Ze
    Wang, Peng
    Cheng, Xiang
    Zhang, Liang
    FRONTIERS IN CYBER SECURITY, FCS 2023, 2024, 1992 : 131 - 140