Improving Vision Transformers with Nested Multi-head Attentions

被引:2
|
作者
Peng, Jiquan [1 ,2 ]
Li, Chaozhuo [3 ]
Zhao, Yi [1 ,2 ]
Lin, Yuting [1 ,2 ]
Fang, Xiaohan [1 ,2 ]
Gong, Jibing [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Key Lab Comp Virtual Technol & Syst Integrat Hebe, Shijiazhuang, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
Vision Transformers; Disentangled Representation;
D O I
10.1109/ICME55011.2023.00330
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have significantly advanced the field of computer vision in recent years. The cornerstone of these transformers is the multi-head attention mechanism, which models interactions between visual elements within a feature map. However, the vanilla multi-head attention paradigm independently learns parameters for each head, which ignores crucial interactions across different attention heads and may result in redundancy and under-utilization of the model's capacity. To enhance model expressiveness, we propose a novel nested attention mechanism, Ne-Att, that explicitly models cross-head interactions via a hierarchical variational distribution. We conducted extensive experiments on image classification, and the results demonstrate the superiority of Ne-Att.
引用
收藏
页码:1925 / 1930
页数:6
相关论文
共 50 条
  • [1] No Head Left Behind - Multi-Head Alignment Distillation for Transformers
    Zhao, Tianyang
    Singh, Kunwar Yashraj
    Appalaraju, Srikar
    Tang, Peng
    Mahadevan, Vijay
    Manmatha, R.
    Wu, Ying Nian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7514 - 7524
  • [2] Improving Multi-head Attention with Capsule Networks
    Gu, Shuhao
    Feng, Yang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 314 - 326
  • [3] Nested Deformable Multi-head Attention for Facial Image Inpainting
    Phutke, Shruti S.
    Murala, Subrahmanyam
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6067 - 6076
  • [4] Modeling Intra-Relation in Math Word Problems with Different Functional Multi-Head Attentions
    Li, Jierui
    Wang, Lei
    Zhang, Jipeng
    Wang, Yan
    Dai, Bing Tian
    Zhang, Dongxiang
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6162 - 6167
  • [5] DEEP RECURRENT NEURAL NETWORKS WITH LAYER-WISE MULTI-HEAD ATTENTIONS FOR PUNCTUATION RESTORATION
    Kim, Seokhwan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7280 - 7284
  • [6] Multi-Modal Learning for AU Detection Based on Multi-Head Fused Transformers
    Zhang, Xiang
    Yin, Lijun
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [7] MMAN-M2: Multiple multi-head attentions network based on encoder with missing modalities
    Li, Jiayao
    Li, Li
    Sun, Ruizhi
    Yuan, Gang
    Wang, Shufan
    Sun, Shulin
    PATTERN RECOGNITION LETTERS, 2024, 177 : 110 - 120
  • [8] A Supervised Multi-Head Self-Attention Network for Nested Named Entity Recognition
    Xu, Yongxiu
    Huang, Heyan
    Feng, Chong
    Hu, Yue
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14185 - 14193
  • [9] MULTI-HEAD SEQUENTIAL TRANSDUCERS
    PAUN, G
    REVUE ROUMAINE DE MATHEMATIQUES PURES ET APPLIQUEES, 1981, 26 (09): : 1235 - 1253
  • [10] On the diversity of multi-head attention
    Li, Jian
    Wang, Xing
    Tu, Zhaopeng
    Lyu, Michael R.
    NEUROCOMPUTING, 2021, 454 : 14 - 24